Sample records for ibus bit processor

  1. Reliability Advancement for Electronic Engine Controllers. Volume 1

    DTIC Science & Technology

    1980-06-01

    natural frequency increases. Hence, the natural frequency is dependent upon pressure in the nonlinear rela- tionship: Pressure = A + Bft + Cft 2 + Dft 3...Hence, the natural frequency is dependent upon pressure in the nonlinear re- I ationship: Pressure = A + Bft + Cft 2 + Dft 3 + Eft 4 where A, B, C, D...BIT STTE TO’, AML ’’E ICONVERTER ICOUNTER BUFFE CPUT IBUS TOAMPLIFIER________IBUS PROGRAM-COKRSE -TT PER:OO INPUTS ENABLE M MR- - L

  2. Towards the formal specification of the requirements and design of a processor interface unit: HOL listings

    NASA Technical Reports Server (NTRS)

    Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.

    1993-01-01

    This technical report contains the HOL listings of the specification of the design and major portions of the requirements for a commercially developed processor interface unit (or PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU specification as it currently exists. Section two of this report contains general-purpose HOL theories that support the PIU specification. These theories include definitions for the hardware components used in the PIU, our implementation of bit words, and our implementation of temporal logic. Section three contains the HOL listings for the PIU design specification. Aside from the PIU internal bus (I-Bus), this specification is complete. Section four contains the HOL listings for a major portion of the PIU requirements specification. Specifically, it contains most of the definition for the PIU behavior associated with memory accesses initiated by the local processor.

  3. Method and apparatus for high speed data acquisition and processing

    DOEpatents

    Ferron, J.R.

    1997-02-11

    A method and apparatus are disclosed for high speed digital data acquisition. The apparatus includes one or more multiplexers for receiving multiple channels of digital data at a low data rate and asserting a multiplexed data stream at a high data rate, and one or more FIFO memories for receiving data from the multiplexers and asserting the data to a real time processor. Preferably, the invention includes two multiplexers, two FIFO memories, and a 64-bit bus connecting the FIFO memories with the processor. Each multiplexer receives four channels of 14-bit digital data at a rate of up to 5 MHz per channel, and outputs a data stream to one of the FIFO memories at a rate of 20 MHz. The FIFO memories assert output data in parallel to the 64-bit bus, thus transferring 14-bit data values to the processor at a combined rate of 40 MHz. The real time processor is preferably a floating-point processor which processes 32-bit floating-point words. A set of mask bits is prestored in each 32-bit storage location of the processor memory into which a 14-bit data value is to be written. After data transfer from the FIFO memories, mask bits are concatenated with each stored 14-bit data value to define a valid 32-bit floating-point word. Preferably, a user can select any of several modes for starting and stopping direct memory transfers of data from the FIFO memories to memory within the real time processor, by setting the content of a control and status register. 15 figs.

  4. Method and apparatus for high speed data acquisition and processing

    DOEpatents

    Ferron, John R.

    1997-01-01

    A method and apparatus for high speed digital data acquisition. The apparatus includes one or more multiplexers for receiving multiple channels of digital data at a low data rate and asserting a multiplexed data stream at a high data rate, and one or more FIFO memories for receiving data from the multiplexers and asserting the data to a real time processor. Preferably, the invention includes two multiplexers, two FIFO memories, and a 64-bit bus connecting the FIFO memories with the processor. Each multiplexer receives four channels of 14-bit digital data at a rate of up to 5 MHz per channel, and outputs a data stream to one of the FIFO memories at a rate of 20 MHz. The FIFO memories assert output data in parallel to the 64-bit bus, thus transferring 14-bit data values to the processor at a combined rate of 40 MHz. The real time processor is preferably a floating-point processor which processes 32-bit floating-point words. A set of mask bits is prestored in each 32-bit storage location of the processor memory into which a 14-bit data value is to be written. After data transfer from the FIFO memories, mask bits are concatenated with each stored 14-bit data value to define a valid 32-bit floating-point word. Preferably, a user can select any of several modes for starting and stopping direct memory transfers of data from the FIFO memories to memory within the real time processor, by setting the content of a control and status register.

  5. Hardware multiplier processor

    DOEpatents

    Pierce, Paul E.

    1986-01-01

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  6. Hardware multiplier processor

    DOEpatents

    Pierce, P.E.

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reed, D.A.; Grunwald, D.C.

    The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less

  8. Digital Hardware Architecture Implementation

    DTIC Science & Technology

    1993-02-15

    of micro - MOTOROLA 63.7 50MHZ 64 BIT 2092 N/A processors during quarterly re- INTEL 42 50MHz 64 BIT 1092 N/A views and monthly reports. The 186o XP...27 3.2.1 Signal Processor (SP) Analysis...31 3.2.1.11 MasPar Software Statements ........................................................ 32 3.2.2 Data Processor

  9. A high-accuracy optical linear algebra processor for finite element applications

    NASA Technical Reports Server (NTRS)

    Casasent, D.; Taylor, B. K.

    1984-01-01

    Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.

  10. A high-speed digital signal processor for atmospheric radar, part 7.3A

    NASA Technical Reports Server (NTRS)

    Brosnahan, J. W.; Woodard, D. M.

    1984-01-01

    The Model SP-320 device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter. The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. Presently available is an I/O module conforming to the Intel Multichannel interface standard; other I/O modules will be designed to meet specific user requirements. The main processor board includes input and output FIFO (First In First Out) memories, both with depths of 4096 W, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MW/s.

  11. DFT algorithms for bit-serial GaAs array processor architectures

    NASA Technical Reports Server (NTRS)

    Mcmillan, Gary B.

    1988-01-01

    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.

  12. Bit-parallel arithmetic in a massively-parallel associative processor

    NASA Technical Reports Server (NTRS)

    Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

    1992-01-01

    A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.

  13. Testing and operating a multiprocessor chip with processor redundancy

    DOEpatents

    Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J

    2014-10-21

    A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.

  14. Fault-tolerant corrector/detector chip for high-speed data processing

    DOEpatents

    Andaleon, David D.; Napolitano, Jr., Leonard M.; Redinbo, G. Robert; Shreeve, William O.

    1994-01-01

    An internally fault-tolerant data error detection and correction integrated circuit device (10) and a method of operating same. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum is provided with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented.

  15. Fault-tolerant corrector/detector chip for high-speed data processing

    DOEpatents

    Andaleon, D.D.; Napolitano, L.M. Jr.; Redinbo, G.R.; Shreeve, W.O.

    1994-03-01

    An internally fault-tolerant data error detection and correction integrated circuit device and a method of operating same is described. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented. 8 figures.

  16. Design of a massively parallel computer using bit serial processing elements

    NASA Technical Reports Server (NTRS)

    Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

    1995-01-01

    A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.

  17. Reconfigurable data path processor

    NASA Technical Reports Server (NTRS)

    Donohoe, Gregory (Inventor)

    2005-01-01

    A reconfigurable data path processor comprises a plurality of independent processing elements. Each of the processing elements advantageously comprising an identical architecture. Each processing element comprises a plurality of data processing means for generating a potential output. Each processor is also capable of through-putting an input as a potential output with little or no processing. Each processing element comprises a conditional multiplexer having a first conditional multiplexer input, a second conditional multiplexer input and a conditional multiplexer output. A first potential output value is transmitted to the first conditional multiplexer input, and a second potential output value is transmitted to the second conditional multiplexer output. The conditional multiplexer couples either the first conditional multiplexer input or the second conditional multiplexer input to the conditional multiplexer output, according to an output control command. The output control command is generated by processing a set of arithmetic status-bits through a logical mask. The conditional multiplexer output is coupled to a first processing element output. A first set of arithmetic bits are generated according to the processing of the first processable value. A second set of arithmetic bits may be generated from a second processing operation. The selection of the arithmetic status-bits is performed by an arithmetic-status bit multiplexer selects the desired set of arithmetic status bits from among the first and second set of arithmetic status bits. The conditional multiplexer evaluates the select arithmetic status bits according to logical mask defining an algorithm for evaluating the arithmetic status bits.

  18. Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

    NASA Astrophysics Data System (ADS)

    Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

    2008-04-01

    This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.

  19. A fully integrated mixed-signal neural processor for implantable multichannel cortical recording.

    PubMed

    Sodagar, Amir M; Wise, Kensall D; Najafi, Khalil

    2007-06-01

    A 64-channel neural processor has been developed for use in an implantable neural recording microsystem. In the Scan Mode, the processor is capable of detecting neural spikes by programmable positive, negative, or window thresholding. Spikes are tagged with their associated channel addresses and formed into 18-bit data words that are sent serially to the external host. In the Monitor Mode, two channels can be selected and viewed at high resolution for studies where the entire signal is of interest. The processor runs from a 3-V supply and a 2-MHz clock, with a channel scan rate of 64 kS/s and an output bit rate of 2 Mbps.

  20. Implementation of digital equality comparator circuit on memristive memory crossbar array using material implication logic

    NASA Astrophysics Data System (ADS)

    Haron, Adib; Mahdzair, Fazren; Luqman, Anas; Osman, Nazmie; Junid, Syed Abdul Mutalib Al

    2018-03-01

    One of the most significant constraints of Von Neumann architecture is the limited bandwidth between memory and processor. The cost to move data back and forth between memory and processor is considerably higher than the computation in the processor itself. This architecture significantly impacts the Big Data and data-intensive application such as DNA analysis comparison which spend most of the processing time to move data. Recently, the in-memory processing concept was proposed, which is based on the capability to perform the logic operation on the physical memory structure using a crossbar topology and non-volatile resistive-switching memristor technology. This paper proposes a scheme to map digital equality comparator circuit on memristive memory crossbar array. The 2-bit, 4-bit, 8-bit, 16-bit, 32-bit, and 64-bit of equality comparator circuit are mapped on memristive memory crossbar array by using material implication logic in a sequential and parallel method. The simulation results show that, for the 64-bit word size, the parallel mapping exhibits 2.8× better performance in total execution time than sequential mapping but has a trade-off in terms of energy consumption and area utilization. Meanwhile, the total crossbar area can be reduced by 1.2× for sequential mapping and 1.5× for parallel mapping both by using the overlapping technique.

  1. Digital signal processor and processing method for GPS receivers

    NASA Technical Reports Server (NTRS)

    Thomas, Jr., Jess B. (Inventor)

    1989-01-01

    A digital signal processor and processing method therefor for use in receivers of the NAVSTAR/GLOBAL POSITIONING SYSTEM (GPS) employs a digital carrier down-converter, digital code correlator and digital tracking processor. The digital carrier down-converter and code correlator consists of an all-digital, minimum bit implementation that utilizes digital chip and phase advancers, providing exceptional control and accuracy in feedback phase and in feedback delay. Roundoff and commensurability errors can be reduced to extremely small values (e.g., less than 100 nanochips and 100 nanocycles roundoff errors and 0.1 millichip and 1 millicycle commensurability errors). The digital tracking processor bases the fast feedback for phase and for group delay in the C/A, P.sub.1, and P.sub.2 channels on the L.sub.1 C/A carrier phase thereby maintaining lock at lower signal-to-noise ratios, reducing errors in feedback delays, reducing the frequency of cycle slips and in some cases obviating the need for quadrature processing in the P channels. Simple and reliable methods are employed for data bit synchronization, data bit removal and cycle counting. Improved precision in averaged output delay values is provided by carrier-aided data-compression techniques. The signal processor employs purely digital operations in the sense that exactly the same carrier phase and group delay measurements are obtained, to the last decimal place, every time the same sampled data (i.e., exactly the same bits) are processed.

  2. A Concurrent Smalltalk Compiler for the Message-Driven Processor

    DTIC Science & Technology

    1988-05-01

    apj with bits from low-bit (inclusive) to high-bit (exclusive) set. ;;;Low-bit defaults to zero. (defmacro brange (high-bit &optional low-bit) (list...n2) (null (cddr num))) (aetg bits (b+ bits (if (>- nl n2) ( brange (1+ nl) n2) ( brange (1+ n2) ni)))) (error "Bad bmap range: -S" flu.)))) (t (error...vlocs) flat ((vlive (b- finst-vllv* mast) *I.( brange firat-context-slot-nun))) (next (inst-next last))) (if (bempty vlive) (delete-module module inat

  3. a Real-Time Computer Music Synthesis System

    NASA Astrophysics Data System (ADS)

    Lent, Keith Henry

    A real time sound synthesis system has been developed at the Computer Music Center of The University of Texas at Austin. This system consists of several stand alone processors that were constructed jointly with White Instruments in Austin. These processors can be programmed as general purpose computers, but are provided with a number of specialized interfaces including: MIDI, 8 bit parallel, high speed serial, 2 channels analog input (18 bit A/Ds, 48kHz sample rate), and 4 channels analog output (18 bit D/As). In addition, a basic music synthesis language (Music56000) has been written in assembly code. On top of this, a symbolic compiler (PatchWork) has been developed to enable algorithms which run in these processors to be created graphically. And finally, a number of efficient time domain numerical models have been developed to enable the construction, simulation, control, and synthesis of many musical acoustics systems in real time on these processors. Specifically, assembly language models for cylindrical and conical horn sections, dissipative losses, tone holes, bells, and a number of linear and nonlinear boundary conditions have been developed.

  4. A floating-point/multiple-precision processor for airborne applications

    NASA Technical Reports Server (NTRS)

    Yee, R.

    1982-01-01

    A compact input output (I/O) numerical processor capable of performing floating-point, multiple precision and other arithmetic functions at execution times which are at least 100 times faster than comparable software emulation is described. The I/O device is a microcomputer system containing a 16 bit microprocessor, a numerical coprocessor with eight 80 bit registers running at a 5 MHz clock rate, 18K random access memory (RAM) and 16K electrically programmable read only memory (EPROM). The processor acts as an intelligent slave to the host computer and can be programmed in high order languages such as FORTRAN and PL/M-86.

  5. Technology transfer of military space microprocessor developments

    NASA Astrophysics Data System (ADS)

    Gorden, C.; King, D.; Byington, L.; Lanza, D.

    1999-01-01

    Over the past 13 years the Air Force Research Laboratory (AFRL) has led the development of microprocessors and computers for USAF space and strategic missile applications. As a result of these Air Force development programs, advanced computer technology is available for use by civil and commercial space customers as well. The Generic VHSIC Spaceborne Computer (GVSC) program began in 1985 at AFRL to fulfill a deficiency in the availability of space-qualified data and control processors. GVSC developed a radiation hardened multi-chip version of the 16-bit, Mil-Std 1750A microprocessor. The follow-on to GVSC, the Advanced Spaceborne Computer Module (ASCM) program, was initiated by AFRL to establish two industrial sources for complete, radiation-hardened 16-bit and 32-bit computers and microelectronic components. Development of the Control Processor Module (CPM), the first of two ASCM contract phases, concluded in 1994 with the availability of two sources for space-qualified, 16-bit Mil-Std-1750A computers, cards, multi-chip modules, and integrated circuits. The second phase of the program, the Advanced Technology Insertion Module (ATIM), was completed in December 1997. ATIM developed two single board computers based on 32-bit reduced instruction set computer (RISC) processors. GVSC, CPM, and ATIM technologies are flying or baselined into the majority of today's DoD, NASA, and commercial satellite systems.

  6. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    PubMed

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

  7. Testability Design Rating System: Testability Handbook. Volume 1

    DTIC Science & Technology

    1992-02-01

    4-10 4.7.5 Summary of False BIT Alarms (FBA) ............................. 4-10 4.7.6 Smart BIT Technique...Circuit Board PGA Pin Grid Array PLA Programmable Logic Array PLD Programmable Logic Device PN Pseudo-Random Number PREDICT Probabilistic Estimation of...11 4.7.6 Smart BIT ( reference: RADC-TR-85-198). " Smart " BIT is a term given to BIT circuitry in a system LRU which includes dedicated processor/memory

  8. Data flow machine for data driven computing

    DOEpatents

    Davidson, G.S.; Grafe, V.G.

    1988-07-22

    A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information from an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ''fire'' signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.

  9. Direct match data flow memory for data driven computing

    DOEpatents

    Davidson, George S.; Grafe, Victor Gerald

    1997-01-01

    A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.

  10. Direct match data flow memory for data driven computing

    DOEpatents

    Davidson, G.S.; Grafe, V.G.

    1997-10-07

    A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.

  11. Free-Electron Laser Driven by the NBS (National Bureau of Standards) CW Microtron

    DTIC Science & Technology

    1988-03-31

    planned over several years. This will begin with the purchase of a 32-bit dual processor system for the yet to be constructed primary station wire scanner ...display subsystem. This 32-bit dual processor system will not only form the wire scanner display system, but has sufficient processing power to...7th hit. Coiif. on FELs, eds., E.T. Scharlemann and D. Prosnitz (North- Holland, Amsterdam, 1986) p. 278. 121 X.K Maruyania and S. Penner, C.M. Tang

  12. System and method for programmable bank selection for banked memory subsystems

    DOEpatents

    Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan

    2010-09-07

    A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.

  13. A Simple and Affordable TTL Processor for the Classroom

    ERIC Educational Resources Information Center

    Feinberg, Dave

    2007-01-01

    This paper presents a simple 4 bit computer processor design that may be built using TTL chips for less than $65. In addition to describing the processor itself in detail, we discuss our experience using the laboratory kit and its associated machine instruction set to teach computer architecture to high school students. (Contains 3 figures and 5…

  14. Rectangular Array Of Digital Processors For Planning Paths

    NASA Technical Reports Server (NTRS)

    Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.

    1993-01-01

    Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.

  15. Burst-mode optical label processor with ultralow power consumption.

    PubMed

    Ibrahim, Salah; Nakahara, Tatsushi; Ishikawa, Hiroshi; Takahashi, Ryo

    2016-04-04

    A novel label processor subsystem for 100-Gbps (25-Gbps × 4λs) burst-mode optical packets is developed, in which a highly energy-efficient method is pursued for extracting and interfacing the ultrafast packet-label to a CMOS-based processor where label recognition takes place. The method involves performing serial-to-parallel conversion for the label bits on a bit-by-bit basis by using an optoelectronic converter that is operated with a set of optical triggers generated in a burst-mode manner upon packet arrival. Here we present three key achievements that enabled a significant reduction in the total power consumption and latency of the whole subsystem; 1) based on a novel operation mechanism for providing amplification with bit-level selectivity, an optical trigger pulse generator, that consumes power for a very short duration upon packet arrival, is proposed and experimentally demonstrated, 2) the energy of optical triggers needed by the optoelectronic serial-to-parallel converter is reduced by utilizing a negative-polarity signal while employing an enhanced conversion scheme entitled the discharge-or-hold scheme, 3) the necessary optical trigger energy is further cut down by half by coupling the triggers through the chip's backside, whereas a novel lens-free packaging method is developed to enable a low-cost alignment process that works with simple visual observation.

  16. Large-N in Volcano Settings: Volcanosri

    NASA Astrophysics Data System (ADS)

    Lees, J. M.; Song, W.; Xing, G.; Vick, S.; Phillips, D.

    2014-12-01

    We seek a paradigm shift in the approach we take on volcano monitoring where the compromise from high fidelity to large numbers of sensors is used to increase coverage and resolution. Accessibility, danger and the risk of equipment loss requires that we develop systems that are independent and inexpensive. Furthermore, rather than simply record data on hard disk for later analysis we desire a system that will work autonomously, capitalizing on wireless technology and in field network analysis. To this end we are currently producing a low cost seismic array which will incorporate, at the very basic level, seismological tools for first cut analysis of a volcano in crises mode. At the advanced end we expect to perform tomographic inversions in the network in near real time. Geophone (4 Hz) sensors connected to a low cost recording system will be installed on an active volcano where triggering earthquake location and velocity analysis will take place independent of human interaction. Stations are designed to be inexpensive and possibly disposable. In one of the first implementations the seismic nodes consist of an Arduino Due processor board with an attached Seismic Shield. The Arduino Due processor board contains an Atmel SAM3X8E ARM Cortex-M3 CPU. This 32 bit 84 MHz processor can filter and perform coarse seismic event detection on a 1600 sample signal in fewer than 200 milliseconds. The Seismic Shield contains a GPS module, 900 MHz high power mesh network radio, SD card, seismic amplifier, and 24 bit ADC. External sensors can be attached to either this 24-bit ADC or to the internal multichannel 12 bit ADC contained on the Arduino Due processor board. This allows the node to support attachment of multiple sensors. By utilizing a high-speed 32 bit processor complex signal processing tasks can be performed simultaneously on multiple sensors. Using a 10 W solar panel, second system being developed can run autonomously and collect data on 3 channels at 100Hz for 6 months with the installed 16Gb SD card. Initial designs and test results will be presented and discussed.

  17. A scalable SIMD digital signal processor for high-quality multifunctional printer systems

    NASA Astrophysics Data System (ADS)

    Kang, Hyeong-Ju; Choi, Yongwoo; Kim, Kimo; Park, In-Cheol; Kim, Jung-Wook; Lee, Eul-Hwan; Gahang, Goo-Soo

    2005-01-01

    This paper describes a high-performance scalable SIMD digital signal processor (DSP) developed for multifunctional printer systems. The DSP supports a variable number of datapaths to cover a wide range of performance and maintain a RISC-like pipeline structure. Many special instructions suitable for image processing algorithms are included in the DSP. Quad/dual instructions are introduced for 8-bit or 16-bit data, and bit-field extraction/insertion instructions are supported to process various data types. Conditional instructions are supported to deal with complex relative conditions efficiently. In addition, an intelligent DMA block is integrated to align data in the course of data reading. Experimental results show that the proposed DSP outperforms a high-end printer-system DSP by at least two times.

  18. A Wearable Healthcare System With a 13.7 μA Noise Tolerant ECG Processor.

    PubMed

    Izumi, Shintaro; Yamashita, Ken; Nakano, Masanao; Kawaguchi, Hiroshi; Kimura, Hiromitsu; Marumoto, Kyoji; Fuchikami, Takaaki; Fujimori, Yoshikazu; Nakajima, Hiroshi; Shiga, Toshikazu; Yoshimoto, Masahiko

    2015-10-01

    To prevent lifestyle diseases, wearable bio-signal monitoring systems for daily life monitoring have attracted attention. Wearable systems have strict size and weight constraints, which impose significant limitations of the battery capacity and the signal-to-noise ratio of bio-signals. This report describes an electrocardiograph (ECG) processor for use with a wearable healthcare system. It comprises an analog front end, a 12-bit ADC, a robust Instantaneous Heart Rate (IHR) monitor, a 32-bit Cortex-M0 core, and 64 Kbyte Ferroelectric Random Access Memory (FeRAM). The IHR monitor uses a short-term autocorrelation (STAC) algorithm to improve the heart-rate detection accuracy despite its use in noisy conditions. The ECG processor chip consumes 13.7 μA for heart rate logging application.

  19. Microprocessor design for GaAs technology

    NASA Astrophysics Data System (ADS)

    Milutinovic, Veljko M.

    Recent advances in the design of GaAs microprocessor chips are examined in chapters contributed by leading experts; the work is intended as reading material for a graduate engineering course or as a practical R&D reference. Topics addressed include the methodology used for the architecture, organization, and design of GaAs processors; GaAs device physics and circuit design; design concepts for microprocessor-based GaAs systems; a 32-bit GaAs microprocessor; a 32-bit processor implemented in GaAs JFET; and a direct coupled-FET-logic E/D-MESFET experimental RISC machine. Drawings, micrographs, and extensive circuit diagrams are provided.

  20. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  1. Video image processor on the Spacelab 2 Solar Optical Universal Polarimeter /SL2 SOUP/

    NASA Technical Reports Server (NTRS)

    Lindgren, R. W.; Tarbell, T. D.

    1981-01-01

    The SOUP instrument is designed to obtain diffraction-limited digital images of the sun with high photometric accuracy. The Video Processor originated from the requirement to provide onboard real-time image processing, both to reduce the telemetry rate and to provide meaningful video displays of scientific data to the payload crew. This original concept has evolved into a versatile digital processing system with a multitude of other uses in the SOUP program. The central element in the Video Processor design is a 16-bit central processing unit based on 2900 family bipolar bit-slice devices. All arithmetic, logical and I/O operations are under control of microprograms, stored in programmable read-only memory and initiated by commands from the LSI-11. Several functions of the Video Processor are described, including interface to the High Rate Multiplexer downlink, cosmetic and scientific data processing, scan conversion for crew displays, focus and exposure testing, and use as ground support equipment.

  2. Rapid Damage Assessment. Volume II. Development and Testing of Rapid Damage Assessment System.

    DTIC Science & Technology

    1981-02-01

    pixels/s Camera Line Rate 732.4 lines/s Pixels per Line 1728 video 314 blank 4 line number (binary) 2 run number (BCD) 2048 total Pixel Resolution 8 bits...sists of an LSI-ll microprocessor, a VDI -200 video display processor, an FD-2 dual floppy diskette subsystem, an FT-I function key-trackball module...COMPONENT LIST FOR IMAGE PROCESSOR SYSTEM IMAGE PROCESSOR SYSTEM VIEWS I VDI -200 Display Processor Racks, Table FD-2 Dual Floppy Diskette Subsystem FT-l

  3. Experience in highly parallel processing using DAP

    NASA Technical Reports Server (NTRS)

    Parkinson, D.

    1987-01-01

    Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.

  4. Design and Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor

    DTIC Science & Technology

    2015-03-10

    for Public Release; Distribution Unlimited Final Report: Design and Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Superconductor technology, RSFQ, RQL, processor design, arithmetic units, high-performance...Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor Report Title The major objective of the project was to design and demonstrate operation

  5. Architecture of a Message-Driven Processor,

    DTIC Science & Technology

    1987-11-01

    Jon Kaplan, Paul Song, Brian Totty, and Scott Wills Artifcial Intelligence Laboratory -4 Laboratory for Computer Science Massachusetts Institute of...Information Dally, Chao, Chien, Hassoun, Horwat, Kaplan, Song, Totty & Wills: Artificial Intelligence i Laboratory and Laboratory for Computer Science, MIT...applied to a problem if we could are 36 bits long (32 data bits + 4 tag bits) and are used to hold efficiently run programs with a granularity of 5s

  6. NbN A/D Conversion of IR Focal Plane Sensor Signal at 10 K

    NASA Technical Reports Server (NTRS)

    Eaton, L.; Durand, D.; Sandell, R.; Spargo, J.; Krabach, T.

    1994-01-01

    We are implementing a 12 bit SFQ counting ADC with parallel-to-serial readout using our established 10 K NbN capability. This circuit provides a key element of the analog signal processor (ASP) used in large infrared focal plane arrays. The circuit processes the signal data stream from a Si:As BIB detector array. A 10 mega samples per second (MSPS) pixel data stream flows from the chip at a 120 megabit bit rate in a format that is compatible with other superconductive time dependent processor (TDP) circuits being developed. We will discuss our planned ASP demonstration, the circuit design, and test results.

  7. Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

    NASA Astrophysics Data System (ADS)

    Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

    2004-10-01

    Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.

  8. Tactical Operations Analysis Support Facility.

    DTIC Science & Technology

    1981-05-01

    Punch/Reader 2 DMC-11AR DDCMP Micro Processor 2 DMC-11DA Network Link Line Unit 2 DL-11E Async Serial Line Interface 4 Intel IN-1670 448K Words MOS Memory...86 5.3 VIRTUAL PROCESSORS - VAX-11/750 ........................... 89 5.4 A RELATIONAL DATA MANAGEMENT SYSTEM - ORACLE...Central Processing Unit (CPU) is a 16 bit processor for high-speed, real time applications, and for large multi-user, multi- task, time shared

  9. Associative architecture for image processing

    NASA Astrophysics Data System (ADS)

    Adar, Rutie; Akerib, Avidan

    1997-09-01

    This article presents a new generation in parallel processing architecture for real-time image processing. The approach is implemented in a real time image processor chip, called the XiumTM-2, based on combining a fully associative array which provides the parallel engine with a serial RISC core on the same die. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and media processing functions in real time. The associative part of the chip is based on patented pending methodology of Associative Computing Ltd. (ACL), which condenses 2048 associative processors, each of 128 'intelligent' bits. Each bit can be a processing bit or a memory bit. At only 33 MHz and 0.6 micron manufacturing technology process, the chip has a computational power of 3 billion ALU operations per second and 66 billion string search operations per second. The fully programmable nature of the XiumTM-2 chip enables developers to use ACL tools to write their own proprietary algorithms combined with existing image processing and analysis functions from ACL's extended set of libraries.

  10. An FPGA computing demo core for space charge simulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Jinyuan; Huang, Yifei; /Fermilab

    2009-01-01

    In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less

  11. A sparse matrix algorithm on the Boolean vector machine

    NASA Technical Reports Server (NTRS)

    Wagner, Robert A.; Patrick, Merrell L.

    1988-01-01

    VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.

  12. New On-board Microprocessors

    NASA Astrophysics Data System (ADS)

    Weigand, R.

    Two new processor devices have been developed for the use on board of spacecrafts. An 8-bit 8032-microcontroller targets typical controlling applications in instruments and sub-systems, or could be used as a main processor on small satellites, whereas the LEON 32-bit SPARC processor can be used for high performance controlling and data processing tasks. The ADV80S32 is fully compliant to the Intel 80x1 architecture and instruction set, extended by additional peripherals, 512 bytes on-chip RAM and a bootstrap PROM, which allows downloading the application software using the CCSDS PacketWire pro- tocol. The memory controller provides a de-multiplexed address/data bus, and allows to access up to 16 MB data and 8 MB program RAM. The peripherals have been de- signed for the specific needs of a spacecraft, such as serial interfaces compatible to RS232, PacketWire and TTC-B-01, counters/timers for extended duration and a CRC calculation unit accelerating the CCSDS TM/TC protocol. The 0.5 um Atmel manu- facturing technology (MG2RT) provides latch-up and total dose immunity; SEU fault immunity is implemented by using SEU hardened Flip-Flops and EDAC protection of internal and external memories. The maximum clock frequency of 20 MHz allows a processing power of 3 MIPS. Engineering samples are available. For SW develop- ment, various SW packages for the 8051 architecture are on the market. The LEON processor implements a 32-bit SPARC V8 architecture, including all the multiply and divide instructions, complemented by a floating-point unit (FPU). It includes several standard peripherals, such as timers/watchdog, interrupt controller, UARTs, parallel I/Os and a memory controller, allowing to use 8, 16 and 32 bit PROM, SRAM or memory mapped I/O. With on-chip separate instruction and data caches, almost one instruction per clock cycle can be reached in some applications. A 33-MHz 32-bit PCI master/target interface and a PCI arbiter allow operating the device in a plug-in card (for SW development on PC etc.), or to consider using it as a PCI master controller in an on-board system. Advanced SEU fault tolerance is in- troduced by design, using triple modular redundancy (TMR) flip-flops for all registers and EDAC protection for all memories. The device will be manufactured in a radia- tion hard Atmel 0.25 um technology, targeting 100 MHz processor clock frequency. The non fault-tolerant LEON processor VHDL model is available as free source code, and the SPARC architecture is a well-known industry standard. Therefore, know-how, software tools and operating systems are widely available.

  13. A microprocessor based high speed packet switch for satellite communications

    NASA Technical Reports Server (NTRS)

    Arozullah, M.; Crist, S. C.

    1980-01-01

    The architectures of a single processor, a three processor, and a multiple processor system are described. The hardware circuits, and software routines required for implementing the three and multiple processor designs are presented. A bit-slice microprocessor was designed and microprogrammed. Maximum throughput was calculated for all three designs. Queue theoretic models for these three designs were developed and utilized to obtain analytical expressions for the average waiting times, overall average response times and average queue sizes. From these expressions, graphs were obtained showing the effect on the system performance of a number of design parameters.

  14. Cognitive Medical Wireless Testbed System (COMWITS)

    DTIC Science & Technology

    2016-11-01

    Number: ...... ...... Sub Contractors (DD882) Names of other research staff Inventions (DD882) Scientific Progress This testbed merges two ARO grants...bit 64 bit CPU Intel Xeon Processor E5-1650v3 (6C, 3.5 GHz, Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon

  15. Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP

    NASA Astrophysics Data System (ADS)

    Brooks, Geoffrey W.

    1996-03-01

    Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.

  16. Control mechanism of double-rotator-structure ternary optical computer

    NASA Astrophysics Data System (ADS)

    Kai, SONG; Liping, YAN

    2017-03-01

    Double-rotator-structure ternary optical processor (DRSTOP) has two characteristics, namely, giant data-bits parallel computing and reconfigurable processor, which can handle thousands of data bits in parallel, and can run much faster than computers and other optical computer systems so far. In order to put DRSTOP into practical application, this paper established a series of methods, namely, task classification method, data-bits allocation method, control information generation method, control information formatting and sending method, and decoded results obtaining method and so on. These methods form the control mechanism of DRSTOP. This control mechanism makes DRSTOP become an automated computing platform. Compared with the traditional calculation tools, DRSTOP computing platform can ease the contradiction between high energy consumption and big data computing due to greatly reducing the cost of communications and I/O. Finally, the paper designed a set of experiments for DRSTOP control mechanism to verify its feasibility and correctness. Experimental results showed that the control mechanism is correct, feasible and efficient.

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Szabo, Levente; Koniorczyk, Matyas; Adam, Peter

    We consider the entanglement manipulation capabilities of the universal covariant quantum cloner or quantum processor circuit for quantum bits. We investigate its use for cloning a member of a bipartite or a genuine tripartite entangled state of quantum bits. We find that for bipartite pure entangled states a nontrivial behavior of concurrence appears, while for GHZ entangled states a possibility of the partial extraction of bipartite entanglement can be achieved.

  18. Fundamental physics issues of multilevel logic in developing a parallel processor.

    NASA Astrophysics Data System (ADS)

    Bandyopadhyay, Anirban; Miki, Kazushi

    2007-06-01

    In the last century, On and Off physical switches, were equated with two decisions 0 and 1 to express every information in terms of binary digits and physically realize it in terms of switches connected in a circuit. Apart from memory-density increase significantly, more possible choices in particular space enables pattern-logic a reality, and manipulation of pattern would allow controlling logic, generating a new kind of processor. Neumann's computer is based on sequential logic, processing bits one by one. But as pattern-logic is generated on a surface, viewing whole pattern at a time is a truly parallel processing. Following Neumann's and Shannons fundamental thermodynamical approaches we have built compatible model based on series of single molecule based multibit logic systems of 4-12 bits in an UHV-STM. On their monolayer multilevel communication and pattern formation is experimentally verified. Furthermore, the developed intelligent monolayer is trained by Artificial Neural Network. Therefore fundamental weak interactions for the building of truly parallel processor are explored here physically and theoretically.

  19. Direct match data flow machine apparatus and process for data driven computing

    DOEpatents

    Davidson, G.S.; Grafe, V.G.

    1997-08-12

    A data flow computer and method of computing are disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.

  20. Data flow machine for data driven computing

    DOEpatents

    Davidson, George S.; Grafe, Victor G.

    1995-01-01

    A data flow computer which of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.

  1. Direct match data flow machine apparatus and process for data driven computing

    DOEpatents

    Davidson, George S.; Grafe, Victor Gerald

    1997-01-01

    A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.

  2. Reconfigurable pipelined processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saccardi, R.J.

    1989-09-19

    This patent describes a reconfigurable pipelined processor for processing data. It comprises: a plurality of memory devices for storing bits of data; a plurality of arithmetic units for performing arithmetic functions with the data; cross bar means for connecting the memory devices with the arithmetic units for transferring data therebetween; at least one counter connected with the cross bar means for providing a source of addresses to the memory devices; at least one variable tick delay device connected with each of the memory devices and arithmetic units; and means for providing control bits to the variable tick delay device formore » variably controlling the input and output operations thereof to selectively delay the memory devices and arithmetic units to align the data for processing in a selected sequence.« less

  3. Method and system for selecting data sampling phase for self timed interface logic

    DOEpatents

    Hoke, Joseph Michael; Ferraiolo, Frank D.; Lo, Tin-Chee; Yarolin, John Michael

    2005-01-04

    An exemplary embodiment of the present invention is a method for transmitting data among processors over a plurality of parallel data lines and a clock signal line. A receiver processor receives both data and a clock signal from a sender processor. At the receiver processor a bit of the data is phased aligned with the transmitted clock signal. The phase aligning includes selecting a data phase from a plurality of data phases in a delay chain and then adjusting the selected data phase to compensate for a round-off error. Additional embodiments include a system and storage medium for transmitting data among processors over a plurality of parallel data lines and a clock signal line.

  4. Cross-Layer Resilience Exploration

    DTIC Science & Technology

    2015-03-31

    complex 563 server-class systems) and any arbitrary fault model (permanent, transient, multi-bit, etc.) System Design Analysis Using flip- flop ...level fault injection, we rank the vulnerability of each flip- flop in the processor in terms of its likelihood to propagate faults [3]. This allows the...hardened flip- flops , which are flip- flops designed to uphold the bit representation of their output circuit even under particle strikes [1, 6, 10

  5. Concurrent error detecting codes for arithmetic processors

    NASA Technical Reports Server (NTRS)

    Lim, R. S.

    1979-01-01

    A method of concurrent error detection for arithmetic processors is described. Low-cost residue codes with check-length l and checkbase m = 2 to the l power - 1 are described for checking arithmetic operations of addition, subtraction, multiplication, division complement, shift, and rotate. Of the three number representations, the signed-magnitude representation is preferred for residue checking. Two methods of residue generation are described: the standard method of using modulo m adders and the method of using a self-testing residue tree. A simple single-bit parity-check code is described for checking the logical operations of XOR, OR, and AND, and also the arithmetic operations of complement, shift, and rotate. For checking complement, shift, and rotate, the single-bit parity-check code is simpler to implement than the residue codes.

  6. QCDOC: A 10-teraflops scale computer for lattice QCD

    NASA Astrophysics Data System (ADS)

    Chen, D.; Christ, N. H.; Cristian, C.; Dong, Z.; Gara, A.; Garg, K.; Joo, B.; Kim, C.; Levkova, L.; Liao, X.; Mawhinney, R. D.; Ohta, S.; Wettig, T.

    2001-03-01

    The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from "QCD On a Chip".

  7. Computer-Aided Fabrication of Integrated Circuits

    DTIC Science & Technology

    1989-09-30

    baseline CMOS process. One result of this effort was the identification of several residual bugs in the PATRAN graphics processor . The vendor promises...virtual memory. The internal Nubus architecture uses a 32-bit LISP processor running at 10 megahertz (100 ns clock period). An ethernet controller is...For different patterns, we need different masks for the photo step, and for dif- ferent micro -structures of the wafers, we need different etching

  8. Design and evaluation of an architecture for a digital signal processor for instrumentation applications

    NASA Astrophysics Data System (ADS)

    Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

    1990-03-01

    The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.

  9. Long sequence correlation coprocessor

    NASA Astrophysics Data System (ADS)

    Gage, Douglas W.

    1994-09-01

    A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.

  10. Multiple Embedded Processors for Fault-Tolerant Computing

    NASA Technical Reports Server (NTRS)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  11. An inexpensive digital tape recorder suitable for neurophysiological signals.

    PubMed

    Lamb, T D

    1985-10-01

    Modifications are described which convert an inexpensive 'Digital Audio Processor' (Sony PCM-701ES), together with a video cassette recorder, into a high performance digital tape recorder, with two analog channels of 16 bit resolution and DC-20 kHz bandwidth. A further modification is described which optionally provides four additional 1-bit digital channels by sacrificing the least significant four bits of one analog channel. If required two additional high quality analog channels may be obtained by use of one of the new video cassette recorders (such as the Sony SL-HF100) which incorporate a pair of FM tracks.

  12. Low Cost Design of an Advanced Encryption Standard (AES) Processor Using a New Common-Subexpression-Elimination Algorithm

    NASA Astrophysics Data System (ADS)

    Chen, Ming-Chih; Hsiao, Shen-Fu

    In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations required in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level XOR/AND-based sum-of-product expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs.

  13. CoNNeCT Baseband Processor Module

    NASA Technical Reports Server (NTRS)

    Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.

    2011-01-01

    A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.

  14. Second Report of the Multirate Processor (MRP) for Digital Voice Communications.

    DTIC Science & Technology

    1982-09-30

    machine are: * two arithmetic logic units (ALUs)-one for data processing, and the other for address generation, * two memorys -6144 words (70 bits per word...of program memory , and 6094 words (16 bits per word) of data memory , q * input/output through modem and teletype, -15 .9 S-;. KANG AND FRANSEN Table...provides a measure of intelligibility and allows one to evaluate the discriminability of six distinctive features: voicing, nasality, sustention

  15. Video Bandwidth Compression System.

    DTIC Science & Technology

    1980-08-01

    scaling function, located between the inverse DPCM and inverse transform , on the decoder matrix multiplier chips. 1"V1 T.. ---- i.13 SECURITY...Bit Unpacker and Inverse DPCM Slave Sync Board 15 e. Inverse DPCM Loop Boards 15 f. Inverse Transform Board 16 g. Composite Video Output Board 16...36 a. Display Refresh Memory 36 (1) Memory Section 37 (2) Timing and Control 39 b. Bit Unpacker and Inverse DPCM 40 c. Inverse Transform Processor 43

  16. Accelerating scientific computations with mixed precision algorithms

    NASA Astrophysics Data System (ADS)

    Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub; Langou, Julie; Langou, Julien; Luszczek, Piotr; Tomov, Stanimire

    2009-12-01

    On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summaryProgram title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes

  17. Command system output bit verification

    NASA Technical Reports Server (NTRS)

    Odd, C. W.; Abbate, S. F.

    1981-01-01

    An automatic test was developed to test the ability of the deep space station (DSS) command subsystem and exciter to generate and radiate, from the exciter, the correct idle bit sequence for a given flight project or to store and radiate received command data elements and files without alteration. This test, called the command system output bit verification test, is an extension of the command system performance test (SPT) and can be selected as an SPT option. The test compares the bit stream radiated from the DSS exciter with reference sequences generated by the SPT software program. The command subsystem and exciter are verified when the bit stream and reference sequences are identical. It is a key element of the acceptance testing conducted on the command processor assembly (CPA) operational program (DMC-0584-OP-G) prior to its transfer from development to operations.

  18. Real-time software receiver

    NASA Technical Reports Server (NTRS)

    Psiaki, Mark L. (Inventor); Kintner, Jr., Paul M. (Inventor); Ledvina, Brent M. (Inventor); Powell, Steven P. (Inventor)

    2007-01-01

    A real-time software receiver that executes on a general purpose processor. The software receiver includes data acquisition and correlator modules that perform, in place of hardware correlation, baseband mixing and PRN code correlation using bit-wise parallelism.

  19. Real-time software receiver

    NASA Technical Reports Server (NTRS)

    Psiaki, Mark L. (Inventor); Ledvina, Brent M. (Inventor); Powell, Steven P. (Inventor); Kintner, Jr., Paul M. (Inventor)

    2006-01-01

    A real-time software receiver that executes on a general purpose processor. The software receiver includes data acquisition and correlator modules that perform, in place of hardware correlation, baseband mixing and PRN code correlation using bit-wise parallelism.

  20. FPGA-Based, Self-Checking, Fault-Tolerant Computers

    NASA Technical Reports Server (NTRS)

    Some, Raphael; Rennels, David

    2004-01-01

    A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.

  1. The special radiation-hardened processors for new highly informative experiments in space

    NASA Astrophysics Data System (ADS)

    Serdin, O. V.; Antonov, A. A.; Dubrovsky, A. G.; Novogilov, E. A.; Zuev, A. L.

    2017-01-01

    The article provides a detailed description of the series of special radiation-hardened microprocessor developed by SRISA for use in space technology. The microprocessors have 32-bit and 64-bit KOMDIV architecture with embedded SpaceWire, RapidIO, Ethernet and MIL-STD-1553B interfaces. These devices are used in space telescope GAMMA-400 data acquisition system, and may also be applied to other experiments in space (such as observatory “Millimetron” etc.).

  2. Design of a dataway processor for a parallel image signal processing system

    NASA Astrophysics Data System (ADS)

    Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

    1995-04-01

    Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.

  3. Data recording and playback on video tape--a multi-channel analog interface for a digital audio processor system.

    PubMed

    Blaettler, M; Bruegger, A; Forster, I C; Lehareinger, Y

    1988-03-01

    The design of an analog interface to a digital audio signal processor (DASP)-video cassette recorder (VCR) system is described. The complete system represents a low-cost alternative to both FM instrumentation tape recorders and multi-channel chart recorders. The interface or DASP input-output unit described in this paper enables the recording and playback of up to 12 analog channels with a maximum of 12 bit resolution and a bandwidth of 2 kHz per channel. Internal control and timing in the recording component of the interface is performed using ROMs which can be reprogrammed to suit different analog-to-digital converter hardware. Improvement in the bandwidth specifications is possible by connecting channels in parallel. A parallel 16 bit data output port is provided for direct transfer of the digitized data to a computer.

  4. MIL-STD-1553B Marconi LSI chip set in a remote terminal application

    NASA Astrophysics Data System (ADS)

    Dimarino, A.

    1982-11-01

    Marconi Avionics is utilizing the MIL-STD-1553B LSI Chip Set in the SCADC Air Data Computer application to perform all of the required remote terminal MIL-STD-1553B protocol functions. Basic components of the RTU are the dual redundant chip set, CT3231 Transceivers, 256 x 16 RAM and a Z8002 microprocessor. Basic transfers are to/from the RAM command of the bus controller or Z8002 processor. During transfers from the processor to the RAM, the chip set busy bit is set for a period not exceeding 250 microseconds. When the transfer is complete, the busy bit is released and transfers to the data bus occur on command. The LSI Chip Set word count lines are used to locate each data word in the local memory and 4 mode codes are used in the application: reset remote terminal, transmit status word, transmitter shut-down, and override transmitter shutdown.

  5. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

    2008-01-01

    High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less

  6. Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

    NASA Astrophysics Data System (ADS)

    Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

    2013-03-01

    With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.

  7. Update on radiation-hardened microcomputers for robotics and teleoperated systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sias, F.R. Jr.; Tulenko, J.S.

    1993-12-31

    Since many programs sponsored by the Department of Defense are being canceled, it is important to select carefully radiation-hardened microprocessors for projects that will mature (or will require continued support) several years in the future. At the present time there are seven candidate 32-bit processors that should be considered for long-range planning for high-performance radiation-hardened computer systems. For Department of Energy applications it is also important to consider efforts at standardization that require the use of the VxWorks operating system and hardware based on the VMEbus. Of the seven processors, one has been delivered and is operating and other systemsmore » are scheduled to be delivered late in 1993 or early in 1994. At the present time the Honeywell-developed RH32, the Harris RH-3000 and the Harris RHC-3000 are leading contenders for meeting DOE requirements for a radiation-hardened advanced 32-bit microprocessor. These are all either compatible with or are derivatives of the MIPS R3000 Reduced Instruction Set Computer. It is anticipated that as few as two of the seven radiation-hardened processors will be supported by the space program in the long run.« less

  8. Digital system for structural dynamics simulation

    NASA Technical Reports Server (NTRS)

    Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.

    1982-01-01

    State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.

  9. A 32-bit NMOS microprocessor with a large register file

    NASA Astrophysics Data System (ADS)

    Sherburne, R. W., Jr.; Katevenis, M. G. H.; Patterson, D. A.; Sequin, C. H.

    1984-10-01

    Two scaled versions of a 32-bit NMOS reduced instruction set computer CPU, called RISC II, have been implemented on two different processing lines using the simple Mead and Conway layout rules with lambda values of 2 and 1.5 microns (corresponding to drawn gate lengths of 4 and 3 microns), respectively. The design utilizes a small set of simple instructions in conjunction with a large register file in order to provide high performance. This approach has resulted in two surprisingly powerful single-chip processors.

  10. 120-MHz BiCMOS superscalar RISC processor

    NASA Astrophysics Data System (ADS)

    Tanaka, Shigeya; Hotta, Takashi; Murabayashi, Fumio; Yamada, Hiromichi; Yoshida, Shoji; Shimamura, Kotaro; Katsura, Koyo; Bandoh, Tadaaki; Ikeda, Koichi; Matsubara, Kenji

    1994-04-01

    A superscalar RISC processor contains 2.8 million transistors in a die size of 16.2 mm x 16.5 mm, and utilizes 3.3 V/0.5 micron BiCMOS technology. In order to take advantage of superscalar performance without incurring penalties from a slower clock or a longer pipeline, a tag bit is implemented in the instruction cache to indicate dependency between two instructions. A performance gain of up to 37% is obtained with only a 3.5% area overhead from our superscalar design.

  11. Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor

    NASA Technical Reports Server (NTRS)

    Moore, J. Strother

    1992-01-01

    Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.

  12. Multi-channel time-reversal receivers for multi and 1-bit implementations

    DOEpatents

    Candy, James V.; Chambers, David H.; Guidry, Brian L.; Poggio, Andrew J.; Robbins, Christopher L.

    2008-12-09

    A communication system for transmitting a signal through a channel medium comprising digitizing the signal, time-reversing the digitized signal, and transmitting the signal through the channel medium. In one embodiment a transmitter is adapted to transmit the signal, a multiplicity of receivers are adapted to receive the signal, a digitizer digitizes the signal, and a time-reversal signal processor is adapted to time-reverse the digitized signal. An embodiment of the present invention includes multi bit implementations. Another embodiment of the present invention includes 1-bit implementations. Another embodiment of the present invention includes a multiplicity of receivers used in the step of transmitting the signal through the channel medium.

  13. Parallel processor for real-time structural control

    NASA Astrophysics Data System (ADS)

    Tise, Bert L.

    1993-07-01

    A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.

  14. Large-Constraint-Length, Fast Viterbi Decoder

    NASA Technical Reports Server (NTRS)

    Collins, O.; Dolinar, S.; Hsu, In-Shek; Pollara, F.; Olson, E.; Statman, J.; Zimmerman, G.

    1990-01-01

    Scheme for efficient interconnection makes VLSI design feasible. Concept for fast Viterbi decoder provides for processing of convolutional codes of constraint length K up to 15 and rates of 1/2 to 1/6. Fully parallel (but bit-serial) architecture developed for decoder of K = 7 implemented in single dedicated VLSI circuit chip. Contains six major functional blocks. VLSI circuits perform branch metric computations, add-compare-select operations, and then store decisions in traceback memory. Traceback processor reads appropriate memory locations and puts out decoded bits. Used as building block for decoders of larger K.

  15. Implementation of an FIR Band Pass Filter Using a Bit-Slice Processor.

    DTIC Science & Technology

    1987-06-01

    R14 S IMPLEMENTATION OF AN FIR BND PASS FILTER USING A 1 /2 S" IT-SLICE PROCESSOR(U) NAVAL POSTGRADUATE SCOOL N UTEREY CA D W PURDY JUN 97...WILRSIFIED F/I 12/6mmhhmmhhmhhhhl EIIIIIIEIIIII IIIIIIEEIIIEI IIIIIIIIIIIIIl IIIIIIIIIIIIIu EIIIIIIIIIIIII - U󈧖 ~1I.25 1 .41.6 -qwr- -qw qw wV vw- .W sw...Ae 10 .w w ’ 1 - w* lo % % q* NAVAL POSTGRADUATE SCHOOL 0 Monterey, California 0 oi a ’: L t ",, I-’ ; OCT 0 �." THESIS IMPLEMENTATION OF AN FIR

  16. SIGPROC: Pulsar Signal Processing Programs

    NASA Astrophysics Data System (ADS)

    Lorimer, D. R.

    2011-07-01

    SIGPROC is a package designed to standardize the initial analysis of the many types of fast-sampled pulsar data. Currently recognized machines are the Wide Band Arecibo Pulsar Processor (WAPP), the Penn State Pulsar Machine (PSPM), the Arecibo Observatory Fourier Transform Machine (AOFTM), the Berkeley Pulsar Processors (BPP), the Parkes/Jodrell 1-bit filterbanks (SCAMP) and the filterbank at the Ooty radio telescope (OOTY). The SIGPROC tools should help users look at their data quickly, without the need to write (yet) another routine to read data or worry about big/little endian compatibility (byte swapping is handled automatically).

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.

    Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whethermore » global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.« less

  18. Hardware accuracy counters for application precision and quality feedback

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    de Paula Rosa Piga, Leonardo; Majumdar, Abhinandan; Paul, Indrani

    Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be outputmore » to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.« less

  19. A GaAs vector processor based on parallel RISC microprocessors

    NASA Astrophysics Data System (ADS)

    Misko, Tim A.; Rasset, Terry L.

    A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.

  20. Parallel processor for real-time structural control

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tise, B.L.

    1992-01-01

    A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less

  1. The Transportable Measurements Facility (TMF) System Description.

    DTIC Science & Technology

    1980-05-23

    sites using different antennas, antenna/site characterization, ATCRBS-mode and DABS-mode processor evaluation, and DABS-based ATC and ATARS system...conditions met. 34 TABLE 3 TMF DATA RECORDED DURING EXPERIMENTS By Word Type 1) By Parameter Word No. of Bits Experiment Number 8 Physical Location Number of

  2. Servicing a globally broadcast interrupt signal in a multi-threaded computer

    DOEpatents

    Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.; Satterfield, David L.

    2015-12-29

    Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whether global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.

  3. A High-Throughput Processor for Flight Control Research Using Small UAVs

    NASA Technical Reports Server (NTRS)

    Klenke, Robert H.; Sleeman, W. C., IV; Motter, Mark A.

    2006-01-01

    There are numerous autopilot systems that are commercially available for small (<100 lbs) UAVs. However, they all share several key disadvantages for conducting aerodynamic research, chief amongst which is the fact that most utilize older, slower, 8- or 16-bit microcontroller technologies. This paper describes the development and testing of a flight control system (FCS) for small UAV s based on a modern, high throughput, embedded processor. In addition, this FCS platform contains user-configurable hardware resources in the form of a Field Programmable Gate Array (FPGA) that can be used to implement custom, application-specific hardware. This hardware can be used to off-load routine tasks such as sensor data collection, from the FCS processor thereby further increasing the computational throughput of the system.

  4. ACE: Automatic Centroid Extractor for real time target tracking

    NASA Technical Reports Server (NTRS)

    Cameron, K.; Whitaker, S.; Canaris, J.

    1990-01-01

    A high performance video image processor has been implemented which is capable of grouping contiguous pixels from a raster scan image into groups and then calculating centroid information for each object in a frame. The algorithm employed to group pixels is very efficient and is guaranteed to work properly for all convex shapes as well as most concave shapes. Processing speeds are adequate for real time processing of video images having a pixel rate of up to 20 million pixels per second. Pixels may be up to 8 bits wide. The processor is designed to interface directly to a transputer serial link communications channel with no additional hardware. The full custom VLSI processor was implemented in a 1.6 mu m CMOS process and measures 7200 mu m on a side.

  5. Solving the Cauchy-Riemann equations on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.

  6. Negative base encoding in optical linear algebra processors

    NASA Technical Reports Server (NTRS)

    Perlee, C.; Casasent, D.

    1986-01-01

    In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.

  7. Optical Neural Classification Of Binary Patterns

    NASA Astrophysics Data System (ADS)

    Gustafson, Steven C.; Little, Gordon R.

    1988-05-01

    Binary pattern classification that may be implemented using optical hardware and neural network algorithms is considered. Pattern classification problems that have no concise description (as in classifying handwritten characters) or no concise computation (as in NP-complete problems) are expected to be particularly amenable to this approach. For example, optical processors that efficiently classify binary patterns in accordance with their Boolean function complexity might be designed. As a candidate for such a design, an optical neural network model is discussed that is designed for binary pattern classification and that consists of an optical resonator with a dynamic multiplex-recorded reflection hologram and a phase conjugate mirror with thresholding and gain. In this model, learning or training examples of binary patterns may be recorded on the hologram such that one bit in each pattern marks the pattern class. Any input pattern, including one with an unknown class or marker bit, will be modified by a large number of parallel interactions with the reflection hologram and nonlinear mirror. After perhaps several seconds and 100 billion interactions, a steady-state pattern may develop with a marker bit that represents a minimum-Boolean-complexity classification of the input pattern. Computer simulations are presented that illustrate progress in understanding the behavior of this model and in developing a processor design that could have commanding and enduring performance advantages compared to current pattern classification techniques.

  8. Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer

    NASA Technical Reports Server (NTRS)

    Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw

    2000-01-01

    Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.

  9. Onboard Experiment Data Support Facility

    NASA Technical Reports Server (NTRS)

    1976-01-01

    An onboard array structure has been devised for end to end processing of data from multiple spaceborne sensors. The array constitutes sets of programmable pipeline processors whose elements perform each assigned function in 0.25 microseconds. This space shuttle computer system can handle data rates from a few bits to over 100 megabits per second.

  10. Interface Circuits for Self-Checking Microprocessors

    NASA Technical Reports Server (NTRS)

    Rennels, D. A.; Chandramouli, R.

    1986-01-01

    Fault-tolerant-microcomputer concept based on enhancing "simple" computer with redundancy and self-checking logic circuits detect hardware faults. Interface and checking logic and redundant processors confer on 16-bit microcomputer ability to check itself for hardware faults. Checking circuitry also checks itself. Concept of self-checking complementary pairs (SCCP's) employed throughout ICL unit.

  11. ATCA digital controller hardware for vertical stabilization of plasmas in tokamaks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Batista, A. J. N.; Sousa, J.; Varandas, C. A. F.

    2006-10-15

    The efficient vertical stabilization (VS) of plasmas in tokamaks requires a fast reaction of the VS controller, for example, after detection of edge localized modes (ELM). For controlling the effects of very large ELMs a new digital control hardware, based on the Advanced Telecommunications Computing Architecture trade mark sign (ATCA), is being developed aiming to reduce the VS digital control loop cycle (down to an optimal value of 10 {mu}s) and improve the algorithm performance. The system has 1 ATCA trade mark sign processor module and up to 12 ATCA trade mark sign control modules, each one with 32 analogmore » input channels (12 bit resolution), 4 analog output channels (12 bit resolution), and 8 digital input/output channels. The Aurora trade mark sign and PCI Express trade mark sign communication protocols will be used for data transport, between modules, with expected latencies below 2 {mu}s. Control algorithms are implemented on a ix86 based processor with 6 Gflops and on field programmable gate arrays with 80 GMACS, interconnected by serial gigabit links in a full mesh topology.« less

  12. Embedded System Implementation on FPGA System With μCLinux OS

    NASA Astrophysics Data System (ADS)

    Fairuz Muhd Amin, Ahmad; Aris, Ishak; Syamsul Azmir Raja Abdullah, Raja; Kalos Zakiah Sahbudin, Ratna

    2011-02-01

    Embedded systems are taking on more complicated tasks as the processors involved become more powerful. The embedded systems have been widely used in many areas such as in industries, automotives, medical imaging, communications, speech recognition and computer vision. The complexity requirements in hardware and software nowadays need a flexibility system for further enhancement in any design without adding new hardware. Therefore, any changes in the design system will affect the processor that need to be changed. To overcome this problem, a System On Programmable Chip (SOPC) has been designed and developed using Field Programmable Gate Array (FPGA). A softcore processor, NIOS II 32-bit RISC, which is the microprocessor core was utilized in FPGA system together with the embedded operating system(OS), μClinux. In this paper, an example of web server is explained and demonstrated

  13. Design and Test of a 65nm CMOS Front-End with Zero Dead Time for Next Generation Pixel Detectors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gaioni, L.; Braga, D.; Christian, D.

    This work is concerned with the experimental characterization of a synchronous analog processor with zero dead time developed in a 65 nm CMOS technology, conceived for pixel detectors at the HL-LHC experiment upgrades. It includes a low noise, fast charge sensitive amplifier with detector leakage compensation circuit, and a compact, single ended comparator able to correctly process hits belonging to two consecutive bunch crossing periods. A 2-bit Flash ADC is exploited for digital conversion immediately after the preamplifier. A description of the circuits integrated in the front-end processor and the initial characterization results are provided

  14. Eight-Channel Digital Signal Processor and Universal Trigger Module

    NASA Astrophysics Data System (ADS)

    Skulski, Wojtek; Wolfs, Frank

    2003-04-01

    A 10-bit, 8-channel, 40 megasamples per second digital signal processor and waveform digitizer DDC-8 (nicknamed Universal Trigger Module) is presented. The digitizer features 8 analog inputs, 1 analog output for a reconstructed analog waveform, 16 NIM logic inputs, 8 NIM logic outputs, and a pool of 16 TTL logic lines which can be individually configured as either inputs or outputs. The first application of this device is to enhance the present trigger electronics for PHOBOS at RHIC. The status of the development and the first results are presented. Possible applications of the new device are discussed. Supported by the NSF grant PHY-0072204.

  15. Hybrid quantum processors: molecular ensembles as quantum memory for solid state circuits.

    PubMed

    Rabl, P; DeMille, D; Doyle, J M; Lukin, M D; Schoelkopf, R J; Zoller, P

    2006-07-21

    We investigate a hybrid quantum circuit where ensembles of cold polar molecules serve as long-lived quantum memories and optical interfaces for solid state quantum processors. The quantum memory realized by collective spin states (ensemble qubit) is coupled to a high-Q stripline cavity via microwave Raman processes. We show that, for convenient trap-surface distances of a few microm, strong coupling between the cavity and ensemble qubit can be achieved. We discuss basic quantum information protocols, including a swap from the cavity photon bus to the molecular quantum memory, and a deterministic two qubit gate. Finally, we investigate coherence properties of molecular ensemble quantum bits.

  16. Flexible Peripheral Component Interconnect Input/Output Card

    NASA Technical Reports Server (NTRS)

    Bigelow, Kirk K.; Jerry, Albert L.; Baricio, Alisha G.; Cummings, Jon K.

    2010-01-01

    The Flexible Peripheral Component Interconnect (PCI) Input/Output (I/O) Card is an innovative circuit board that provides functionality to interface between a variety of devices. It supports user-defined interrupts for interface synchronization, tracks system faults and failures, and includes checksum and parity evaluation of interface data. The card supports up to 16 channels of high-speed, half-duplex, low-voltage digital signaling (LVDS) serial data, and can interface combinations of serial and parallel devices. Placement of a processor within the field programmable gate array (FPGA) controls an embedded application with links to host memory over its PCI bus. The FPGA also provides protocol stacking and quick digital signal processor (DSP) functions to improve host performance. Hardware timers, counters, state machines, and other glue logic support interface communications. The Flexible PCI I/O Card provides an interface for a variety of dissimilar computer systems, featuring direct memory access functionality. The card has the following attributes: 8/16/32-bit, 33-MHz PCI r2.2 compliance, Configurable for universal 3.3V/5V interface slots, PCI interface based on PLX Technology's PCI9056 ASIC, General-use 512K 16 SDRAM memory, General-use 1M 16 Flash memory, FPGA with 3K to 56K logical cells with embedded 27K to 198K bits RAM, I/O interface: 32-channel LVDS differential transceivers configured in eight, 4-bit banks; signaling rates to 200 MHz per channel, Common SCSI-3, 68-pin interface connector.

  17. Introducing difference recurrence relations for faster semi-global alignment of long sequences.

    PubMed

    Suzuki, Hajime; Kasahara, Masahiro

    2018-02-19

    The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .

  18. Hardware description ADSP-21020 40-bit floating point DSP as designed in a remotely controlled digital CW Doppler radar

    NASA Astrophysics Data System (ADS)

    Morrison, R. E.; Robinson, S. H.

    A continuous wave Doppler radar system has been designed which is portable, easily deployed, and remotely controlled. The heart of this system is a DSP/control board using Analog Devices ADSP-21020 40-bit floating point digital signal processor (DSP) microprocessor. Two 18-bit audio A/D converters provide digital input to the DSP/controller board for near real time target detection. Program memory for the DSP is dual ported with an Intel 87C51 microcontroller allowing DSP code to be up-loaded or down-loaded from a central controlling computer. The 87C51 provides overall system control for the remote radar and includes a time-of-day/day-of-year real time clock, system identification (ID) switches, and input/output (I/O) expansion by an Intel 82C55 I/O expander.

  19. A microprocessor based on a two-dimensional semiconductor.

    PubMed

    Wachter, Stefan; Polyushkin, Dmitry K; Bethge, Ole; Mueller, Thomas

    2017-04-11

    The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III-V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor-molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material.

  20. A microprocessor based on a two-dimensional semiconductor

    NASA Astrophysics Data System (ADS)

    Wachter, Stefan; Polyushkin, Dmitry K.; Bethge, Ole; Mueller, Thomas

    2017-04-01

    The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III-V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor--molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material.

  1. Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm

    DOE PAGES

    Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; ...

    2018-02-12

    Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. Here, we use a superconducting-qubit-based processor to apply the QSE approach to the H 2 molecule, extracting both groundmore » and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.« less

  2. Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm

    NASA Astrophysics Data System (ADS)

    Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; Blok, M. S.; Kimchi-Schwartz, M. E.; McClean, J. R.; Carter, J.; de Jong, W. A.; Siddiqi, I.

    2018-02-01

    Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. We use a superconducting-qubit-based processor to apply the QSE approach to the H2 molecule, extracting both ground and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.

  3. Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Colless, J. I.; Ramasesh, V. V.; Dahlen, D.

    Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. Here, we use a superconducting-qubit-based processor to apply the QSE approach to the H 2 molecule, extracting both groundmore » and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.« less

  4. Implementation of an ADI method on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1987-01-01

    The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.

  5. FPGA wavelet processor design using language for instruction-set architectures (LISA)

    NASA Astrophysics Data System (ADS)

    Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

    2007-04-01

    The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.

  6. On VLSI Design of Rank-Order Filtering using DCRAM Architecture

    PubMed Central

    Lin, Meng-Chun; Dung, Lan-Rong

    2009-01-01

    This paper addresses on VLSI design of rank-order filtering (ROF) with a maskable memory for real-time speech and image processing applications. Based on a generic bit-sliced ROF algorithm, the proposed design uses a special-defined memory, called the dual-cell random-access memory (DCRAM), to realize major operations of ROF: threshold decomposition and polarization. Using the memory-oriented architecture, the proposed ROF processor can benefit from high flexibility, low cost and high speed. The DCRAM can perform the bit-sliced read, partial write, and pipelined processing. The bit-sliced read and partial write are driven by maskable registers. With recursive execution of the bit-slicing read and partial write, the DCRAM can effectively realize ROF in terms of cost and speed. The proposed design has been implemented using TSMC 0.18 μm 1P6M technology. As shown in the result of physical implementation, the core size is 356.1 × 427.7μm2 and the VLSI implementation of ROF can operate at 256 MHz for 1.8V supply. PMID:19865599

  7. A low power biomedical signal processor ASIC based on hardware software codesign.

    PubMed

    Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T

    2009-01-01

    A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.

  8. Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

    NASA Astrophysics Data System (ADS)

    Yi, Hongsuk

    2014-03-01

    Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.

  9. COED Transactions, Vol. IX, No. 6, June 1977. An Introductory Course in Microprocessors and Microcomputers.

    ERIC Educational Resources Information Center

    Marcovitz, Alan B., Ed.

    This paper describes an introductory course in microprocessors and microcomputers implemented at Grossmont College. The current state-of-the-art in the microprocessor field is discussed, with special emphasis on the 8-bit MOS single-chip processors which are the most commonly used devices. Objectives and guidelines for the course are presented,…

  10. High Resolution Imaging Testbed Utilizing Sodium Laser Guide Star Adaptive Optics: The Real Time Wavefront Reconstructor Computer

    DTIC Science & Technology

    2008-07-31

    Unlike the Lyrtech, each DSP on a Bittware board offers 3 MB of on-chip memory and 3 GFLOPs of 32-bit peak processing power. Based on the performance...Each NVIDIA 8800 Ultra features 576 GFLOPS on 128 612-MHz single-precision floating-point SIMD processors, arranged in 16 clusters of eight. Each

  11. Mark IVA microprocessor support

    NASA Technical Reports Server (NTRS)

    Burford, A. L.

    1982-01-01

    The requirements and plans for the maintenance support of microprocessor-based controllers in the Deep Space Network Mark IVA System are discussed. Additional new interfaces and 16-bit processors have introduced problems not present in the Mark III System. The need for continuous training of maintenance personnel to maintain a level of expertise consistent with the sophistication of the required tools is also emphasized.

  12. Fault-Tolerant Computing: An Overview

    DTIC Science & Technology

    1991-06-01

    Addison Wesley:, Reading, MA) 1984. [8] J. Wakerly , Error Detecting Codes, Self-Checking Circuits and Applications , (Elsevier North Holland, Inc.- New York... applicable to bit-sliced organi- zations of hardware. In the first time step, the normal computation is performed on the operands and the results...for error detection and fault tolerance in parallel processor systems while perform- ing specific computation-intensive applications [111. Contrary to

  13. SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Analysis of the precision parameters of an optoelectronic vector-matrix processor of digital information

    NASA Astrophysics Data System (ADS)

    Odinokov, S. B.; Petrov, A. V.

    1995-10-01

    Mathematical models of components of a vector-matrix optoelectronic multiplier are considered. Perturbing factors influencing a real optoelectronic system — noise and errors of radiation sources and detectors, nonlinearity of an analogue—digital converter, nonideal optical systems — are taken into account. Analytic expressions are obtained for relating the precision of such a multiplier to the probability of an error amounting to one bit, to the parameters describing the quality of the multiplier components, and to the quality of the optical system of the processor. Various methods of increasing the dynamic range of a multiplier are considered at the technical systems level.

  14. Integrated optical circuits for numerical computation

    NASA Technical Reports Server (NTRS)

    Verber, C. M.; Kenan, R. P.

    1983-01-01

    The development of integrated optical circuits (IOC) for numerical-computation applications is reviewed, with a focus on the use of systolic architectures. The basic architecture criteria for optical processors are shown to be the same as those proposed by Kung (1982) for VLSI design, and the advantages of IOCs over bulk techniques are indicated. The operation and fabrication of electrooptic grating structures are outlined, and the application of IOCs of this type to an existing 32-bit, 32-Mbit/sec digital correlator, a proposed matrix multiplier, and a proposed pipeline processor for polynomial evaluation is discussed. The problems arising from the inherent nonlinearity of electrooptic gratings are considered. Diagrams and drawings of the application concepts are provided.

  15. Experience with custom processors in space flight applications

    NASA Technical Reports Server (NTRS)

    Fraeman, M. E.; Hayes, J. R.; Lohr, D. A.; Ballard, B. W.; Williams, R. L.; Henshaw, R. M.

    1991-01-01

    The Applied Physics Laboratory (APL) has developed a magnetometer instrument for a swedish satellite named Freja with launch scheduled for August 1992 on a Chinese Long March rocket. The magnetometer controller utilized a custom microprocessor designed at APL with the Genesil silicon compiler. The processor evolved from our experience with an older bit-slice design and two prior single chip efforts. The architecture of our microprocessor greatly lowered software development costs because it was optimized to provide an interactive and extensible programming environment hosted by the target hardware. Radiation tolerance of the microprocessor was also tested and was adequate for Freja's mission -- 20 kRad(Si) total dose and very infrequent latch-up and single event upset events.

  16. Research in the design of high-performance reconfigurable systems

    NASA Technical Reports Server (NTRS)

    Slotnick, D. L.; Mcewan, S. D.; Spry, A. J.

    1984-01-01

    An initial design for the Bit Processor (BP) referred to in prior reports as the Processing Element or PE has been completed. Eight BP's, together with their supporting random-access memory, a 64 k x 9 ROM to perform addition, routing logic, and some additional logic, constitute the components of a single stage. An initial stage design is given. Stages may be combined to perform high-speed fixed or floating point arithmetic. Stages can be configured into a range of arithmetic modules that includes bit-serial one or two-dimensional arrays; one or two dimensional arrays fixed or floating point processors; and specialized uniprocessors, such as long-word arithmetic units. One to eight BP's represent a likely initial chip level. The Stage would then correspond to a first-level pluggable module. As both this project and VLSI CAD/CAM progress, however, it is expected that the chip level would migrate upward to the stage and, perhaps, ultimately the box level. The BP RAM, consisting of two banks, holds only operands and indices. Programs are at the box (high-level function) and system level. At the system level initial effort has been concentrated on specifying the tools needed to evaluate design alternatives.

  17. Embedded neural recording with TinyOS-based wireless-enabled processor modules.

    PubMed

    Farshchi, Shahin; Pesterev, Aleksey; Nuyujukian, Paul; Guenterberg, Eric; Mody, Istvan; Judy, Jack W

    2010-04-01

    To create a wireless neural recording system that can benefit from the continuous advancements being made in embedded microcontroller and communications technologies, an embedded-system-based architecture for wireless neural recording has been designed, fabricated, and tested. The system consists of commercial-off-the-shelf wireless-enabled processor modules (motes) for communicating the neural signals, and a back-end database server and client application for archiving and browsing the neural signals. A neural-signal-acquisition application has been developed to enable the mote to either acquire neural signals at a rate of 4000 12-bit samples per second, or detect and transmit spike heights and widths sampled at a rate of 16670 12-bit samples per second on a single channel. The motes acquire neural signals via a custom low-noise neural-signal amplifier with adjustable gain and high-pass corner frequency that has been designed, and fabricated in a 1.5-microm CMOS process. In addition to browsing acquired neural data, the client application enables the user to remotely toggle modes of operation (real-time or spike-only), as well as amplifier gain and high-pass corner frequency.

  18. Ultra-Reliable Digital Avionics (URDA) processor

    NASA Astrophysics Data System (ADS)

    Branstetter, Reagan; Ruszczyk, William; Miville, Frank

    1994-10-01

    Texas Instruments Incorporated (TI) developed the URDA processor design under contract with the U.S. Air Force Wright Laboratory and the U.S. Army Night Vision and Electro-Sensors Directorate. TI's approach couples advanced packaging solutions with advanced integrated circuit (IC) technology to provide a high-performance (200 MIPS/800 MFLOPS) modular avionics processor module for a wide range of avionics applications. TI's processor design integrates two Ada-programmable, URDA basic processor modules (BPM's) with a JIAWG-compatible PiBus and TMBus on a single F-22 common integrated processor-compatible form-factor SEM-E avionics card. A separate, high-speed (25-MWord/second 32-bit word) input/output bus is provided for sensor data. Each BPM provides a peak throughput of 100 MIPS scalar concurrent with 400-MFLOPS vector processing in a removable multichip module (MCM) mounted to a liquid-flowthrough (LFT) core and interfacing to a processor interface module printed wiring board (PWB). Commercial RISC technology coupled with TI's advanced bipolar complementary metal oxide semiconductor (BiCMOS) application specific integrated circuit (ASIC) and silicon-on-silicon packaging technologies are used to achieve the high performance in a miniaturized package. A Mips R4000-family reduced instruction set computer (RISC) processor and a TI 100-MHz BiCMOS vector coprocessor (VCP) ASIC provide, respectively, the 100 MIPS of a scalar processor throughput and 400 MFLOPS of vector processing throughput for each BPM. The TI Aladdim ASIC chipset was developed on the TI Aladdin Program under contract with the U.S. Army Communications and Electronics Command and was sponsored by the Advanced Research Projects Agency with technical direction from the U.S. Army Night Vision and Electro-Sensors Directorate.

  19. STAR: FPGA-based software defined satellite transponder

    NASA Astrophysics Data System (ADS)

    Davalle, Daniele; Cassettari, Riccardo; Saponara, Sergio; Fanucci, Luca; Cucchi, Luca; Bigongiari, Franco; Errico, Walter

    2013-05-01

    This paper presents STAR, a flexible Telemetry, Tracking & Command (TT&C) transponder for Earth Observation (EO) small satellites, developed in collaboration with INTECS and SITAEL companies. With respect to state-of-the-art EO transponders, STAR includes the possibility of scientific data transfer thanks to the 40 Mbps downlink data-rate. This feature represents an important optimization in terms of hardware mass, which is important for EO small satellites. Furthermore, in-flight re-configurability of communication parameters via telecommand is important for in-orbit link optimization, which is especially useful for low orbit satellites where visibility can be as short as few hundreds of seconds. STAR exploits the principles of digital radio to minimize the analog section of the transceiver. 70MHz intermediate frequency (IF) is the interface with an external S/X band radio-frequency front-end. The system is composed of a dedicated configurable high-speed digital signal processing part, the Signal Processor (SP), described in technology-independent VHDL working with a clock frequency of 184.32MHz and a low speed control part, the Control Processor (CP), based on the 32-bit Gaisler LEON3 processor clocked at 32 MHz, with SpaceWire and CAN interfaces. The quantization parameters were fine-tailored to reach a trade-off between hardware complexity and implementation loss which is less than 0.5 dB at BER = 10-5 for the RX chain. The IF ports require 8-bit precision. The system prototype is fitted on the Xilinx Virtex 6 VLX75T-FF484 FPGA of which a space-qualified version has been announced. The total device occupation is 82 %.

  20. QCA Gray Code Converter Circuits Using LTEx Methodology

    NASA Astrophysics Data System (ADS)

    Mukherjee, Chiradeep; Panda, Saradindu; Mukhopadhyay, Asish Kumar; Maji, Bansibadan

    2018-07-01

    The Quantum-dot Cellular Automata (QCA) is the prominent paradigm of nanotechnology considered to continue the computation at deep sub-micron regime. The QCA realizations of several multilevel circuit of arithmetic logic unit have been introduced in the recent years. However, as high fan-in Binary to Gray (B2G) and Gray to Binary (G2B) Converters exist in the processor based architecture, no attention has been paid towards the QCA instantiation of the Gray Code Converters which are anticipated to be used in 8-bit, 16-bit, 32-bit or even more bit addressable machines of Gray Code Addressing schemes. In this work the two-input Layered T module is presented to exploit the operation of an Exclusive-OR Gate (namely LTEx module) as an elemental block. The "defect-tolerant analysis" of the two-input LTEx module has been analyzed to establish the scalability and reproducibility of the LTEx module in the complex circuits. The novel formulations exploiting the operability of the LTEx module have been proposed to instantiate area-delay efficient B2G and G2B Converters which can be exclusively used in Gray Code Addressing schemes. Moreover this work formulates the QCA design metrics such as O-Cost, Effective area, Delay and Cost α for the n-bit converter layouts.

  1. QCA Gray Code Converter Circuits Using LTEx Methodology

    NASA Astrophysics Data System (ADS)

    Mukherjee, Chiradeep; Panda, Saradindu; Mukhopadhyay, Asish Kumar; Maji, Bansibadan

    2018-04-01

    The Quantum-dot Cellular Automata (QCA) is the prominent paradigm of nanotechnology considered to continue the computation at deep sub-micron regime. The QCA realizations of several multilevel circuit of arithmetic logic unit have been introduced in the recent years. However, as high fan-in Binary to Gray (B2G) and Gray to Binary (G2B) Converters exist in the processor based architecture, no attention has been paid towards the QCA instantiation of the Gray Code Converters which are anticipated to be used in 8-bit, 16-bit, 32-bit or even more bit addressable machines of Gray Code Addressing schemes. In this work the two-input Layered T module is presented to exploit the operation of an Exclusive-OR Gate (namely LTEx module) as an elemental block. The "defect-tolerant analysis" of the two-input LTEx module has been analyzed to establish the scalability and reproducibility of the LTEx module in the complex circuits. The novel formulations exploiting the operability of the LTEx module have been proposed to instantiate area-delay efficient B2G and G2B Converters which can be exclusively used in Gray Code Addressing schemes. Moreover this work formulates the QCA design metrics such as O-Cost, Effective area, Delay and Cost α for the n-bit converter layouts.

  2. Advanced Physiological Estimation of Cognitive Status. Part 2

    DTIC Science & Technology

    2011-05-24

    Neurofeedback Algorithms and Gaze Controller EEG Sensor System g.USBamp *, ** • internal 24-bit ADC and digital signal processor • 16 channels (expandable...SUBJECT TERMS EEG eye-tracking mental state estimation machine learning Leonard J. Trejo Pacific Development and Technology LLC 999 Commercial St. Palo...fatigue, overload) Technology Transfer Opportunity Technology from PDT – Methods to acquire various physiological signals ( EEG , EOG, EMG, ECG, etc

  3. MOBS - A modular on-board switching system

    NASA Astrophysics Data System (ADS)

    Berner, W.; Grassmann, W.; Piontek, M.

    The authors describe a multibeam satellite system that is designed for business services and for communications at a high bit rate. The repeater is regenerative with a modular onboard switching system. It acts not only as baseband switch but also as the central node of the network, performing network control and protocol evaluation. The hardware is based on a modular bus/memory architecture with associated processors.

  4. GR712RC- Dual-Core Processor- Product Status

    NASA Astrophysics Data System (ADS)

    Sturesson, Fredrik; Habinc, Sandi; Gaisler, Jiri

    2012-08-01

    The GR712RC System-on-Chip (SoC) is a dual core LEON3FT system suitable for advanced high reliability space avionics. Fault tolerance features from Aeroflex Gaisler’s GRLIB IP library and an implementation using Ramon Chips RadSafe cell library enables superior radiation hardness.The GR712RC device has been designed to provide high processing power by including two LEON3FT 32- bit SPARC V8 processors, each with its own high- performance IEEE754 compliant floating-point-unit and SPARC reference memory management unit.This high processing power is combined with a large number of serial interfaces, ranging from high-speed links for data transfers to low-speed control buses for commanding and status acquisition.

  5. A high-speed, large-capacity, 'jukebox' optical disk system

    NASA Technical Reports Server (NTRS)

    Ammon, G. J.; Calabria, J. A.; Thomas, D. T.

    1985-01-01

    Two optical disk 'jukebox' mass storage systems which provide access to any data in a store of 10 to the 13th bits (1250G bytes) within six seconds have been developed. The optical disk jukebox system is divided into two units, including a hardware/software controller and a disk drive. The controller provides flexibility and adaptability, through a ROM-based microcode-driven data processor and a ROM-based software-driven control processor. The cartridge storage module contains 125 optical disks housed in protective cartridges. Attention is given to a conceptual view of the disk drive unit, the NASA optical disk system, the NASA database management system configuration, the NASA optical disk system interface, and an open systems interconnect reference model.

  6. High-performance multiprocessor architecture for a 3-D lattice gas model

    NASA Technical Reports Server (NTRS)

    Lee, F.; Flynn, M.; Morf, M.

    1991-01-01

    The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.

  7. Radio Astronomy Software Defined Receiver Project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vacaliuc, Bogdan; Leech, Marcus; Oxley, Paul

    The paper describes a Radio Astronomy Software Defined Receiver (RASDR) that is currently under development. RASDR is targeted for use by amateurs and small institutions where cost is a primary consideration. The receiver will operate from HF thru 2.8 GHz. Front-end components such as preamps, block down-converters and pre-select bandpass filters are outside the scope of this development and will be provided by the user. The receiver includes RF amplifiers and attenuators, synthesized LOs, quadrature down converters, dual 8 bit ADCs and a Signal Processor that provides firmware processing of the digital bit stream. RASDR will interface to a usermore » s PC via a USB or higher speed Ethernet LAN connection. The PC will run software that provides processing of the bit stream, a graphical user interface, as well as data analysis and storage. Software should support MAC OS, Windows and Linux platforms and will focus on such radio astronomy applications as total power measurements, pulsar detection, and spectral line studies.« less

  8. A microprocessor based on a two-dimensional semiconductor

    PubMed Central

    Wachter, Stefan; Polyushkin, Dmitry K.; Bethge, Ole; Mueller, Thomas

    2017-01-01

    The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III–V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor—molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material. PMID:28398336

  9. Design and implementation of a medium speed communications interface and protocol for a low cost, refreshed display computer

    NASA Technical Reports Server (NTRS)

    Phyne, J. R.; Nelson, M. D.

    1975-01-01

    The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.

  10. High precision computing with charge domain devices and a pseudo-spectral method therefor

    NASA Technical Reports Server (NTRS)

    Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor); Fijany, Amir (Inventor); Zak, Michail (Inventor)

    1997-01-01

    The present invention enhances the bit resolution of a CCD/CID MVM processor by storing each bit of each matrix element as a separate CCD charge packet. The bits of each input vector are separately multiplied by each bit of each matrix element in massive parallelism and the resulting products are combined appropriately to synthesize the correct product. In another aspect of the invention, such arrays are employed in a pseudo-spectral method of the invention, in which partial differential equations are solved by expressing each derivative analytically as matrices, and the state function is updated at each computation cycle by multiplying it by the matrices. The matrices are treated as synaptic arrays of a neural network and the state function vector elements are treated as neurons. In a further aspect of the invention, moving target detection is performed by driving the soliton equation with a vector of detector outputs. The neural architecture consists of two synaptic arrays corresponding to the two differential terms of the soliton-equation and an adder connected to the output thereof and to the output of the detector array to drive the soliton equation.

  11. A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories

    DTIC Science & Technology

    1989-02-01

    frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a

  12. Microdot - A Four-Bit Microcontroller Designed for Distributed Low-End Computing in Satellites

    NASA Astrophysics Data System (ADS)

    2002-03-01

    Many satellites are an integrated collection of sensors and actuators that require dedicated real-time control. For single processor systems, additional sensors require an increase in computing power and speed to provide the multi-tasking capability needed to service each sensor. Faster processors cost more and consume more power, which taxes a satellite's power resources and may lead to shorter satellite lifetimes. An alternative design approach is a distributed network of small and low power microcontrollers designed for space that handle the computing requirements of each individual sensor and actuator. The design of microdot, a four-bit microcontroller for distributed low-end computing, is presented. The design is based on previous research completed at the Space Electronics Branch, Air Force Research Laboratory (AFRL/VSSE) at Kirtland AFB, NM, and the Air Force Institute of Technology at Wright-Patterson AFB, OH. The Microdot has 29 instructions and a 1K x 4 instruction memory. The distributed computing architecture is based on the Philips Semiconductor I2C Serial Bus Protocol. A prototype was implemented and tested using an Altera Field Programmable Gate Array (FPGA). The prototype was operable to 9.1 MHz. The design was targeted for fabrication in a radiation-hardened-by-design gate-array cell library for the TSMC 0.35 micrometer CMOS process.

  13. A customizable system for real-time image processing using the Blackfin DSProcessor and the MicroC/OS-II real-time kernel

    NASA Astrophysics Data System (ADS)

    Coffey, Stephen; Connell, Joseph

    2005-06-01

    This paper presents a development platform for real-time image processing based on the ADSP-BF533 Blackfin processor and the MicroC/OS-II real-time operating system (RTOS). MicroC/OS-II is a completely portable, ROMable, pre-emptive, real-time kernel. The Blackfin Digital Signal Processors (DSPs), incorporating the Analog Devices/Intel Micro Signal Architecture (MSA), are a broad family of 16-bit fixed-point products with a dual Multiply Accumulate (MAC) core. In addition, they have a rich instruction set with variable instruction length and both DSP and MCU functionality thus making them ideal for media based applications. Using the MicroC/OS-II for task scheduling and management, the proposed system can capture and process raw RGB data from any standard 8-bit greyscale image sensor in soft real-time and then display the processed result using a simple PC graphical user interface (GUI). Additionally, the GUI allows configuration of the image capture rate and the system and core DSP clock rates thereby allowing connectivity to a selection of image sensors and memory devices. The GUI also allows selection from a set of image processing algorithms based in the embedded operating system.

  14. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  15. Design of an anti-Rician-fading modem for mobile satellite communication systems

    NASA Technical Reports Server (NTRS)

    Kojima, Toshiharu; Ishizu, Fumio; Miyake, Makoto; Murakami, Keishi; Fujino, Tadashi

    1995-01-01

    To design a demodulator applicable to mobile satellite communication systems using differential phase shift keying modulation, we have developed key technologies including an anti-Rician-fading demodulation scheme, an initial acquisition scheme, automatic gain control (AGC), automatic frequency control (AFC), and bit timing recovery (BTR). Using these technologies, we have developed one-chip digital signal processor (DSP) modem for mobile terminal, which is compact, of light weight, and of low power consumption. Results of performance test show that the developed DSP modem achieves good performance in terms of bit error ratio in mobile satellite communication environment, i.e., Rician fading channel. It is also shown that the initial acquisition scheme acquires received signal rapidly even if the carrier-to-noise power ratio (CNR) of the received signal is considerably low.

  16. The design of infrared information collection circuit based on embedded technology

    NASA Astrophysics Data System (ADS)

    Liu, Haoting; Zhang, Yicong

    2013-07-01

    S3C2410 processor is a 16/32 bit RISC embedded processor which based on ARM920T core and AMNA bus, and mainly for handheld devices, and high cost, low-power applications. This design introduces a design plan of the PIR sensor system, circuit and its assembling, debugging. The Application Circuit of the passive PIR alarm uses the invisibility of the infrared radiation well into the alarm system, and in order to achieve the anti-theft alarm and security purposes. When the body goes into the range of PIR sensor detection, sensors will detect heat sources and then the sensor will output a weak signal. The Signal should be amplified, compared and delayed; finally light emitting diodes emit light, playing the role of a police alarm.

  17. Frequency domain laser velocimeter signal processor: A new signal processing scheme

    NASA Technical Reports Server (NTRS)

    Meyers, James F.; Clemmons, James I., Jr.

    1987-01-01

    A new scheme for processing signals from laser velocimeter systems is described. The technique utilizes the capabilities of advanced digital electronics to yield a smart instrument that is able to configure itself, based on the characteristics of the input signals, for optimum measurement accuracy. The signal processor is composed of a high-speed 2-bit transient recorder for signal capture and a combination of adaptive digital filters with energy and/or zero crossing detection signal processing. The system is designed to accept signals with frequencies up to 100 MHz with standard deviations up to 20 percent of the average signal frequency. Results from comparative simulation studies indicate measurement accuracies 2.5 times better than with a high-speed burst counter, from signals with as few as 150 photons per burst.

  18. Advanced satellite communication system

    NASA Technical Reports Server (NTRS)

    Staples, Edward J.; Lie, Sen

    1992-01-01

    The objective of this research program was to develop an innovative advanced satellite receiver/demodulator utilizing surface acoustic wave (SAW) chirp transform processor and coherent BPSK demodulation. The algorithm of this SAW chirp Fourier transformer is of the Convolve - Multiply - Convolve (CMC) type, utilizing off-the-shelf reflective array compressor (RAC) chirp filters. This satellite receiver, if fully developed, was intended to be used as an on-board multichannel communications repeater. The Advanced Communications Receiver consists of four units: (1) CMC processor, (2) single sideband modulator, (3) demodulator, and (4) chirp waveform generator and individual channel processors. The input signal is composed of multiple user transmission frequencies operating independently from remotely located ground terminals. This signal is Fourier transformed by the CMC Processor into a unique time slot for each user frequency. The CMC processor is driven by a waveform generator through a single sideband (SSB) modulator. The output of the coherent demodulator is composed of positive and negative pulses, which are the envelopes of the chirp transform processor output. These pulses correspond to the data symbols. Following the demodulator, a logic circuit reconstructs the pulses into data, which are subsequently differentially decoded to form the transmitted data. The coherent demodulation and detection of BPSK signals derived from a CMC chirp transform processor were experimentally demonstrated and bit error rate (BER) testing was performed. To assess the feasibility of such advanced receiver, the results were compared with the theoretical analysis and plotted for an average BER as a function of signal-to-noise ratio. Another goal of this SBIR program was the development of a commercial product. The commercial product developed was an arbitrary waveform generator. The successful sales have begun with the delivery of the first arbitrary waveform generator.

  19. A new version of the CADNA library for estimating round-off error propagation in Fortran programs

    NASA Astrophysics Data System (ADS)

    Jézéquel, Fabienne; Chesneaux, Jean-Marie; Lamotte, Jean-Luc

    2010-11-01

    The CADNA library enables one to estimate, using a probabilistic approach, round-off error propagation in any simulation program. CADNA provides new numerical types, the so-called stochastic types, on which round-off errors can be estimated. Furthermore CADNA contains the definition of arithmetic and relational operators which are overloaded for stochastic variables and the definition of mathematical functions which can be used with stochastic arguments. On 64-bit processors, depending on the rounding mode chosen, the mathematical library associated with the GNU Fortran compiler may provide incorrect results or generate severe bugs. Therefore the CADNA library has been improved to enable the numerical validation of programs on 64-bit processors. New version program summaryProgram title: CADNA Catalogue identifier: AEAT_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEAT_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 28 488 No. of bytes in distributed program, including test data, etc.: 463 778 Distribution format: tar.gz Programming language: Fortran NOTE: A C++ version of this program is available in the Library as AEGQ_v1_0 Computer: PC running LINUX with an i686 or an ia64 processor, UNIX workstations including SUN, IBM Operating system: LINUX, UNIX Classification: 6.5 Catalogue identifier of previous version: AEAT_v1_0 Journal reference of previous version: Comput. Phys. Commun. 178 (2008) 933 Does the new version supersede the previous version?: Yes Nature of problem: A simulation program which uses floating-point arithmetic generates round-off errors, due to the rounding performed at each assignment and at each arithmetic operation. Round-off error propagation may invalidate the result of a program. The CADNA library enables one to estimate round-off error propagation in any simulation program and to detect all numerical instabilities that may occur at run time. Solution method: The CADNA library [1-3] implements Discrete Stochastic Arithmetic [4,5] which is based on a probabilistic model of round-off errors. The program is run several times with a random rounding mode generating different results each time. From this set of results, CADNA estimates the number of exact significant digits in the result that would have been computed with standard floating-point arithmetic. Reasons for new version: On 64-bit processors, the mathematical library associated with the GNU Fortran compiler may provide incorrect results or generate severe bugs with rounding towards -∞ and +∞, which the random rounding mode is based on. Therefore a particular definition of mathematical functions for stochastic arguments has been included in the CADNA library to enable its use with the GNU Fortran compiler on 64-bit processors. Summary of revisions: If CADNA is used on a 64-bit processor with the GNU Fortran compiler, mathematical functions are computed with rounding to the nearest, otherwise they are computed with the random rounding mode. It must be pointed out that the knowledge of the accuracy of the stochastic argument of a mathematical function is never lost. Restrictions: CADNA requires a Fortran 90 (or newer) compiler. In the program to be linked with the CADNA library, round-off errors on complex variables cannot be estimated. Furthermore array functions such as product or sum must not be used. Only the arithmetic operators and the abs, min, max and sqrt functions can be used for arrays. Additional comments: In the library archive, users are advised to read the INSTALL file first. The doc directory contains a user guide named ug.cadna.pdf which shows how to control the numerical accuracy of a program using CADNA, provides installation instructions and describes test runs. The source code, which is located in the src directory, consists of one assembly language file (cadna_rounding.s) and eighteen Fortran language files. cadna_rounding.s is a symbolic link to the assembly file corresponding to the processor and the Fortran compiler used. This assembly file contains routines which are frequently called in the CADNA Fortran files to change the rounding mode. The Fortran language files contain the definition of the stochastic types on which the control of accuracy can be performed, CADNA specific functions (for instance to enable or disable the detection of numerical instabilities), the definition of arithmetic and relational operators which are overloaded for stochastic variables and the definition of mathematical functions which can be used with stochastic arguments. The examples directory contains seven test runs which illustrate the use of the CADNA library and the benefits of Discrete Stochastic Arithmetic. Running time: The version of a code which uses CADNA runs at least three times slower than its floating-point version. This cost depends on the computer architecture and can be higher if the detection of numerical instabilities is enabled. In this case, the cost may be related to the number of instabilities detected.

  20. A Digital Motion Control System for Large Telescopes

    NASA Astrophysics Data System (ADS)

    Hunter, T. R.; Wilson, R. W.; Kimberk, R.; Leiker, P. S.

    2001-05-01

    We have designed and programmed a digital motion control system for large telescopes, in particular, the 6-meter antennas of the Submillimeter Array on Mauna Kea. The system consists of a single robust, high-reliability microcontroller board which implements a two-axis velocity servo while monitoring and responding to critical safety parameters. Excellent tracking performance has been achieved with this system (0.3 arcsecond RMS at sidereal rate). The 24x24 centimeter four-layer printed circuit board contains a multitude of hardware devices: 40 digital inputs (for limit switches and fault indicators), 32 digital outputs (to enable/disable motor amplifiers and brakes), a quad 22-bit ADC (to read the motor tachometers), four 16-bit DACs (that provide torque signals to the motor amplifiers), a 32-LED status panel, a serial port to the LynxOS PowerPC antenna computer (RS422/460kbps), a serial port to the Palm Vx handpaddle (RS232/115kbps), and serial links to the low-resolution absolute encoders on the azimuth and elevation axes. Each section of the board employs independent ground planes and power supplies, with optical isolation on all I/O channels. The processor is an Intel 80C196KC 16-bit microcontroller running at 20MHz on an 8-bit bus. This processor executes an interrupt-driven, scheduler-based software system written in C and assembled into an EPROM with user-accessible variables stored in NVSRAM. Under normal operation, velocity update requests arrive at 100Hz from the position-loop servo process running independently on the antenna computer. A variety of telescope safety checks are performed at 279Hz including routine servicing of a 6 millisecond watchdog timer. Additional ADCs onboard the microcontroller monitor the winding temperature and current in the brushless three-phase drive motors. The PID servo gains can be dynamically changed in software. Calibration factors and software filters can be applied to the tachometer readings prior to the application of the servo gains in the torque computations. The Palm pilot handpaddle displays the complete status of the telescope and allows full local control of the drives in an intuitive, touchscreen user interface which is especially useful during reconfigurations of the antenna array.

  1. Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels

    NASA Astrophysics Data System (ADS)

    Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.

    1982-04-01

    This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.

  2. An experimental distributed microprocessor implementation with a shared memory communications and control medium

    NASA Technical Reports Server (NTRS)

    Mejzak, R. S.

    1980-01-01

    The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.

  3. Low-voltage analog front-end processor design for ISFET-based sensor and H+ sensing applications

    NASA Astrophysics Data System (ADS)

    Chung, Wen-Yaw; Yang, Chung-Huang; Peng, Kang-Chu; Yeh, M. H.

    2003-04-01

    This paper presents a modular-based low-voltage analog-front-end processor design in a 0.5mm double-poly double-metal CMOS technology for Ion Sensitive Field Effect Transistor (ISFET)-based sensor and H+ sensing applications. To meet the potentiometric response of the ISFET that is proportional to various H+ concentrations, the constant-voltage and constant current (CVCS) testing configuration has been used. Low-voltage design skills such as bulk-driven input pair, folded-cascode amplifier, bootstrap switch control circuits have been designed and integrated for 1.5V supply and nearly rail-to-rail analog to digital signal processing. Core modules consist of an 8-bit two-step analog-digital converter and bulk-driven pre-amplifiers have been developed in this research. The experimental results show that the proposed circuitry has an acceptable linearity to 0.1 pH-H+ sensing conversions with the buffer solution in the range of pH2 to pH12. The processor has a potential usage in battery-operated and portable healthcare devices and environmental monitoring applications.

  4. G-cueing microcontroller (a microprocessor application in simulators)

    NASA Technical Reports Server (NTRS)

    Horattas, C. G.

    1980-01-01

    A g cueing microcontroller is described which consists of a tandem pair of microprocessors, dedicated to the task of simulating pilot sensed cues caused by gravity effects. This task includes execution of a g cueing model which drives actuators that alter the configuration of the pilot's seat. The g cueing microcontroller receives acceleration commands from the aerodynamics model in the main computer and creates the stimuli that produce physical acceleration effects of the aircraft seat on the pilots anatomy. One of the two microprocessors is a fixed instruction processor that performs all control and interface functions. The other, a specially designed bipolar bit slice microprocessor, is a microprogrammable processor dedicated to all arithmetic operations. The two processors communicate with each other by a shared memory. The g cueing microcontroller contains its own dedicated I/O conversion modules for interface with the seat actuators and controls, and a DMA controller for interfacing with the simulation computer. Any application which can be microcoded within the available memory, the available real time and the available I/O channels, could be implemented in the same controller.

  5. Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

    NASA Astrophysics Data System (ADS)

    Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

    2017-12-01

    As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.

  6. Readout and DAQ for Pixel Detectors

    NASA Astrophysics Data System (ADS)

    Platkevic, Michal

    2010-01-01

    Data readout and acquisition control of pixel detectors demand the transfer of significantly a large amounts of bits between the detector and the computer. For this purpose dedicated interfaces are used which are designed with focus on features like speed, small dimensions or flexibility of use such as digital signal processors, field-programmable gate arrays (FPGA) and USB communication ports. This work summarizes the readout and DAQ system built for state-of-the-art pixel detectors of the Medipix family.

  7. Implementation of the DAST ARW II control laws using an 8086 microprocessor and an 8087 floating-point coprocessor. [drones for aeroelasticity research

    NASA Technical Reports Server (NTRS)

    Kelly, G. L.; Berthold, G.; Abbott, L.

    1982-01-01

    A 5 MHZ single-board microprocessor system which incorporates an 8086 CPU and an 8087 Numeric Data Processor is used to implement the control laws for the NASA Drones for Aerodynamic and Structural Testing, Aeroelastic Research Wing II. The control laws program was executed in 7.02 msec, with initialization consuming 2.65 msec and the control law loop 4.38 msec. The software emulator execution times for these two tasks were 36.67 and 61.18, respectively, for a total of 97.68 msec. The space, weight and cost reductions achieved in the present, aircraft control application of this combination of a 16-bit microprocessor with an 80-bit floating point coprocessor may be obtainable in other real time control applications.

  8. Supercomputing on massively parallel bit-serial architectures

    NASA Technical Reports Server (NTRS)

    Iobst, Ken

    1985-01-01

    Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.

  9. Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

    NASA Astrophysics Data System (ADS)

    Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

    2015-05-01

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).

  10. A novel parallel architecture for local histogram equalization

    NASA Astrophysics Data System (ADS)

    Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan

    2005-07-01

    Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.

  11. Realtime multiprocessor for mobile ad hoc networks

    NASA Astrophysics Data System (ADS)

    Jungeblut, T.; Grünewald, M.; Porrmann, M.; Rückert, U.

    2008-05-01

    This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC) for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC). The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.

  12. Performance of a flight qualified, thermoelectrically temperature controlled QCM sensor with power supply, thermal controller and signal processor

    NASA Technical Reports Server (NTRS)

    Wallace, D. A.

    1980-01-01

    A thermoelectrically temperature controlled quartz crystal microbalance (QCM) system was developed for the measurement of ion thrustor generated mercury contamination on spacecraft. Meaningful flux rate measurements dictated an accurately held sensing crystal temperature despite spacecraft surface temperature variations from -35 C to +60 C over the flight temperature range. An electronic control unit was developed with magentic amplifier transformer secondary power supply, thermal control electronics, crystal temperature analog conditioning and a multiplexed 16 bit frequency encoder.

  13. Investigating the Naval Logistics Role in Humanitarian Assistance Activities

    DTIC Science & Technology

    2015-03-01

    transportation means. E. BASE CASE RESULTS The computations were executed on a MacBook Pro , 3 GHz Intel Core i7-4578U processor with 8 GB. The...MacBook Pro was partitioned to also contain a Windows 7, 64-bit operating system. The computations were run in the Windows 7 operating system using the...it impacts the types of metamodels that can be developed as a result of data farming (Lucas et al., 2015). Using a metamodel, one can closely

  14. Universal sensor interface module (USIM)

    NASA Astrophysics Data System (ADS)

    King, Don; Torres, A.; Wynn, John

    1999-01-01

    A universal sensor interface model (USIM) is being developed by the Raytheon-TI Systems Company for use with fields of unattended distributed sensors. In its production configuration, the USIM will be a multichip module consisting of a set of common modules. The common module USIM set consists of (1) a sensor adapter interface (SAI) module, (2) digital signal processor (DSP) and associated memory module, and (3) a RF transceiver model. The multispectral sensor interface is designed around a low-power A/D converted, whose input/output interface consists of: -8 buffered, sampled inputs from various devices including environmental, acoustic seismic and magnetic sensors. The eight sensor inputs are each high-impedance, low- capacitance, differential amplifiers. The inputs are ideally suited for interface with discrete or MEMS sensors, since the differential input will allow direct connection with high-impedance bridge sensors and capacitance voltage sources. Each amplifier is connected to a 22-bit (Delta) (Sigma) A/D converter to enable simultaneous samples. The low power (Delta) (Sigma) converter provides 22-bit resolution at sample frequencies up to 142 hertz (used for magnetic sensors) and 16-bit resolution at frequencies up to 1168 hertz (used for acoustic and seismic sensors). The video interface module is based around the TMS320C5410 DSP. It can provide sensor array addressing, video data input, data calibration and correction. The processor module is based upon a MPC555. It will be used for mode control, synchronization of complex sensors, sensor signal processing, array processing, target classification and tracking. Many functions of the A/D, DSP and transceiver can be powered down by using variable clock speeds under software command or chip power switches. They can be returned to intermediate or full operation by DSP command. Power management may be based on the USIM's internal timer, command from the USIM transceiver, or by sleep mode processing management. The low power detection mode is implemented by monitoring any of the sensor analog outputs at lower sample rates for detection over a software controllable threshold.

  15. The use of imprecise processing to improve accuracy in weather & climate prediction

    NASA Astrophysics Data System (ADS)

    Düben, Peter D.; McNamara, Hugh; Palmer, T. N.

    2014-08-01

    The use of stochastic processing hardware and low precision arithmetic in atmospheric models is investigated. Stochastic processors allow hardware-induced faults in calculations, sacrificing bit-reproducibility and precision in exchange for improvements in performance and potentially accuracy of forecasts, due to a reduction in power consumption that could allow higher resolution. A similar trade-off is achieved using low precision arithmetic, with improvements in computation and communication speed and savings in storage and memory requirements. As high-performance computing becomes more massively parallel and power intensive, these two approaches may be important stepping stones in the pursuit of global cloud-resolving atmospheric modelling. The impact of both hardware induced faults and low precision arithmetic is tested using the Lorenz '96 model and the dynamical core of a global atmosphere model. In the Lorenz '96 model there is a natural scale separation; the spectral discretisation used in the dynamical core also allows large and small scale dynamics to be treated separately within the code. Such scale separation allows the impact of lower-accuracy arithmetic to be restricted to components close to the truncation scales and hence close to the necessarily inexact parametrised representations of unresolved processes. By contrast, the larger scales are calculated using high precision deterministic arithmetic. Hardware faults from stochastic processors are emulated using a bit-flip model with different fault rates. Our simulations show that both approaches to inexact calculations do not substantially affect the large scale behaviour, provided they are restricted to act only on smaller scales. By contrast, results from the Lorenz '96 simulations are superior when small scales are calculated on an emulated stochastic processor than when those small scales are parametrised. This suggests that inexact calculations at the small scale could reduce computation and power costs without adversely affecting the quality of the simulations. This would allow higher resolution models to be run at the same computational cost.

  16. Stroboscope Controller for Imaging Helicopter Rotors

    NASA Technical Reports Server (NTRS)

    Jensen, Scott; Marmie, John; Mai, Nghia

    2004-01-01

    A versatile electronic timing-and-control unit, denoted a rotorcraft strobe controller, has been developed for use in controlling stroboscopes, lasers, video cameras, and other instruments for capturing still images of rotating machine parts especially helicopter rotors. This unit is designed to be compatible with a variety of sources of input shaftangle or timing signals and to be capable of generating a variety of output signals suitable for triggering instruments characterized by different input-signal specifications. It is also designed to be flexible and reconfigurable in that it can be modified and updated through changes in its control software, without need to change its hardware. Figure 1 is a block diagram of the rotorcraft strobe controller. The control processor is a high-density complementary metal oxide semiconductor, singlechip 8-bit microcontroller. It is connected to a 32K x 8 nonvolatile static random-access memory (RAM) module. Also connected to the control processor is a 32K 8 electrically programmable read-only-memory (EPROM) module, which is used to store the control software. Digital logic support circuitry is implemented in a field-programmable gate array (FPGA). A 240 x 128-dot, 40- character 16-line liquid-crystal display (LCD) module serves as a graphical user interface; the user provides input through a 16-key keypad mounted next to the LCD. A 12-bit digital-to-analog converter (DAC) generates a 0-to-10-V ramp output signal used as part of a rotor-blade monitoring system, while the control processor generates all the appropriate strobing signals. Optocouplers are used to isolate all input and output digital signals, and optoisolators are used to isolate all analog signals. The unit is designed to fit inside a 19-in. (.48-cm) rack-mount enclosure. Electronic components are mounted on a custom printed-circuit board (see Figure 2). Two power-conversion modules on the printedcircuit board convert AC power to +5 VDC and 15 VDC, respectively.

  17. Implementation of High Speed Distributed Data Acquisition System

    NASA Astrophysics Data System (ADS)

    Raju, Anju P.; Sekhar, Ambika

    2012-09-01

    This paper introduces a high speed distributed data acquisition system based on a field programmable gate array (FPGA). The aim is to develop a "distributed" data acquisition interface. The development of instruments such as personal computers and engineering workstations based on "standard" platforms is the motivation behind this effort. Using standard platforms as the controlling unit allows independence in hardware from a particular vendor and hardware platform. The distributed approach also has advantages from a functional point of view: acquisition resources become available to multiple instruments; the acquisition front-end can be physically remote from the rest of the instrument. High speed data acquisition system transmits data faster to a remote computer system through Ethernet interface. The data is acquired through 16 analog input channels. The input data commands are multiplexed and digitized and then the data is stored in 1K buffer for each input channel. The main control unit in this design is the 16 bit processor implemented in the FPGA. This 16 bit processor is used to set up and initialize the data source and the Ethernet controller, as well as control the flow of data from the memory element to the NIC. Using this processor we can initialize and control the different configuration registers in the Ethernet controller in a easy manner. Then these data packets are sending to the remote PC through the Ethernet interface. The main advantages of the using FPGA as standard platform are its flexibility, low power consumption, short design duration, fast time to market, programmability and high density. The main advantages of using Ethernet controller AX88796 over others are its non PCI interface, the presence of embedded SRAM where transmit and reception buffers are located and high-performance SRAM-like interface. The paper introduces the implementation of the distributed data acquisition using FPGA by VHDL. The main advantages of this system are high accuracy, high speed, real time monitoring.

  18. Digital camera with apparatus for authentication of images produced from an image file

    NASA Technical Reports Server (NTRS)

    Friedman, Gary L. (Inventor)

    1993-01-01

    A digital camera equipped with a processor for authentication of images produced from an image file taken by the digital camera is provided. The digital camera processor has embedded therein a private key unique to it, and the camera housing has a public key that is so uniquely based upon the private key that digital data encrypted with the private key by the processor may be decrypted using the public key. The digital camera processor comprises means for calculating a hash of the image file using a predetermined algorithm, and second means for encrypting the image hash with the private key, thereby producing a digital signature. The image file and the digital signature are stored in suitable recording means so they will be available together. Apparatus for authenticating at any time the image file as being free of any alteration uses the public key for decrypting the digital signature, thereby deriving a secure image hash identical to the image hash produced by the digital camera and used to produce the digital signature. The apparatus calculates from the image file an image hash using the same algorithm as before. By comparing this last image hash with the secure image hash, authenticity of the image file is determined if they match, since even one bit change in the image hash will cause the image hash to be totally different from the secure hash.

  19. Sequence information signal processor for local and global string comparisons

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1997-01-01

    A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.

  20. Josephson 4 K-bit cache memory design for a prototype signal processor. I - General overview

    NASA Astrophysics Data System (ADS)

    Henkels, W. H.; Geppert, L. M.; Kadlec, J.; Epperlein, P. W.; Beha, H.

    1985-09-01

    In the early stages of thg Josephson computer project conducted at an American computer company, it was recognized that a very fast cache memory was needed to complement Josephson logic. A subnanosecond access time memory was implemented experimentally on the basis of a 2.5-micron Pb-alloy technology. It was then decided to switch over to a Nb-base-electrode technology with the objective to alleviate problems with the long-term reliability and aging of Pb-based junctions. The present paper provides a general overview of the status of a 4 x 1 K-bit Josephson cache design employing a 2.5-micron Nb-edge-junction technology. Attention is given to the fabrication process and its implications, aspects of circuit design methodology, an overview of system environment and chip components, design changes and status, and various difficulties and uncertainties.

  1. Finite element computation on nearest neighbor connected machines

    NASA Technical Reports Server (NTRS)

    Mcaulay, A. D.

    1984-01-01

    Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.

  2. Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi

    DOE PAGES

    Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...

    2015-05-22

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less

  3. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  4. High-Precision Timing of Several Millisecond Pulsars

    NASA Astrophysics Data System (ADS)

    Ferdman, R. D.; Stairs, I. H.; Backer, D. C.; Ramachandran, R.; Demorest, P.; Nice, D. J.; Lyne, A. G.; Kramer, M.; Lorimer, D.; McLaughlin, M.; Manchester, D.; Camilo, F.; D'Amico, N.; Possenti, A.; Burgay, M.; Joshi, B. C.; Freire, P. C.

    2004-12-01

    The highest precision pulsar timing is achieved by reproducing as accurately as possible the pulse profile as emitted by the pulsar, in high signal-to-noise observations. The best profile reconstruction can be accomplished with several-bit voltage sampling and coherent removal of the dispersion suffered by pulsar signals as they traverse the interstellar medium. The Arecibo Signal Processor (ASP) and its counterpart the Green Bank Astronomical Signal Processor (GASP) are flexible, state-of-the-art wide-bandwidth observing systems, built primarily for high-precision long-term timing of millisecond and binary pulsars. ASP and GASP are in use at the 300-m Arecibo telescope in Puerto Rico and the 100-m Green Bank Telescope in Green Bank, West Virginia, respectively, taking advantage of the enormous sensitivities of these telescopes. These instruments result in high-precision science through 4 and 8-bit sampling and perform coherent dedispersion on the incoming data stream in real or near-real time. This is done using a network of personal computers, over an observing bandwidth of 64 to 128 MHz, in each of two polarizations. We present preliminary results of timing and polarimetric observations with ASP/GASP for several pulsars, including the recently-discovered relativistic double-pulsar binary J0737-3039. These data are compared to simultaneous observations with other pulsar instruments, such as the new "spigot card" spectrometer on the GBT and the Princeton Mark IV instrument at Arecibo, the precursor timing system to ASP. We also briefly discuss several upcoming observations with ASP/GASP.

  5. VASP-4096: a very high performance programmable device for digital media processing applications

    NASA Astrophysics Data System (ADS)

    Krikelis, Argy

    2001-03-01

    Over the past few years, technology drivers for microprocessors have changed significantly. Media data delivery and processing--such as telecommunications, networking, video processing, speech recognition and 3D graphics--is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper presents the architecture of the VASP-4096 processor. VASP-4096 provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the VASP-4096 is the integration of thousands of processing units in a single chip that are capable of support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, VASP-4096 integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in VASP-4096 implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHz (double-data rate), and a 64- bit 66 MHz PCI interface. VASP-4096, compared with other processors architectures that support media processing, offers true performance scalability, support for deterministic and non-deterministic data processing on a single device, and software programmability that can be re- used in future chip generations.

  6. Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS

    NASA Astrophysics Data System (ADS)

    Lachowicz, Stefan W.; Rassau, Alexander; Lee, Seung-Minh; Eshraghian, Kamran; Lee, Mike M.

    2003-04-01

    Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.

  7. Performance of the unique-word-reverse-modulation type demodulator for mobile satellite communications

    NASA Technical Reports Server (NTRS)

    Dohi, Tomohiro; Nitta, Kazumasa; Ueda, Takashi

    1993-01-01

    This paper proposes a new type of coherent demodulator, the unique-word (UW)-reverse-modulation type demodulator, for burst signal controlled by voice operated transmitter (VOX) in mobile satellite communication channels. The demodulator has three individual circuits: a pre-detection signal combiner, a pre-detection UW detector, and a UW-reverse-modulation type demodulator. The pre-detection signal combiner combines signal sequences received by two antennas and improves bit energy-to-noise power density ratio (E(sub b)/N(sub 0)) 2.5 dB to yield 10(exp -3) average bit error rate (BER) when carrier power-to-multipath power ratio (CMR) is 15 dB. The pre-detection UW detector improves UW detection probability when the frequency offset is large. The UW-reverse-modulation type demodulator realizes a maximum pull-in frequency of 3.9 kHz, the pull-in time is 2.4 seconds and frequency error is less than 20 Hz. The performances of this demodulator are confirmed through computer simulations and its effect is clarified in real-time experiments at a bit rate of 16.8 kbps using a digital signal processor (DSP).

  8. Design and test of a regenerative satellite transmultiplexer

    NASA Astrophysics Data System (ADS)

    Hung, Kenny King-Ming

    1993-05-01

    In a multiple access scheme for regenerative satellite communications, the bulk frequency division multiple access (FDMA) uplink signal is demodulated on board the satellite and then remodulated for time division multiplexing (TDM) downlink transmission. Conversion from frequency to time division multiplex format requires that the uplink signal be frequency demultiplexed and each individual carrier be subsequently demodulated. For thin-route application which consists of a large number of channels with fixed data rate, multicarrier demodulation can be accomplished efficiently by a digital transmultiplexer (TMUX) using a fast Fourier transform processor followed by a bank of per-channel processors. A time domain description of the TMUX algorithm is derived which elucidates how the TMUX functions. The per-channel processor performs timing and carrier recovery for optimum and coherent data detection. Timing recovery is necessarily achieved asynchronously by a filter coefficient interpolation. Carrier recovery is performed using an all-digital phase-locked loop. The combination of both timing and carrier loops is investigated for a multi-user system. The performance of the overall system is assessed over a multi-user, additive white Gaussian noise channel for a bit energy to noise power spectral density ratio down to zero dB.

  9. Real-Time Data Processing in the muon system of the D0 detector.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neeti Parashar et al.

    2001-07-03

    This paper presents a real-time application of the 16-bit fixed point Digital Signal Processors (DSPs), in the Muon System of the D0 detector located at the Fermilab Tevatron, presently the world's highest-energy hadron collider. As part of the Upgrade for a run beginning in the year 2000, the system is required to process data at an input event rate of 10 KHz without incurring significant deadtime in readout. The ADSP21csp01 processor has high I/O bandwidth, single cycle instruction execution and fast task switching support to provide efficient multisignal processing. The processor's internal memory consists of 4K words of Program Memorymore » and 4K words of Data Memory. In addition there is an external memory of 32K words for general event buffering and 16K words of Dual port Memory for input data queuing. This DSP fulfills the requirement of the Muon subdetector systems for data readout. All error handling, buffering, formatting and transferring of the data to the various trigger levels of the data acquisition system is done in software. The algorithms developed for the system complete these tasks in about 20 {micro}s per event.« less

  10. Experiences modeling ocean circulation problems on a 30 node commodity cluster with 3840 GPU processor cores.

    NASA Astrophysics Data System (ADS)

    Hill, C.

    2008-12-01

    Low cost graphic cards today use many, relatively simple, compute cores to deliver support for memory bandwidth of more than 100GB/s and theoretical floating point performance of more than 500 GFlop/s. Right now this performance is, however, only accessible to highly parallel algorithm implementations that, (i) can use a hundred or more, 32-bit floating point, concurrently executing cores, (ii) can work with graphics memory that resides on the graphics card side of the graphics bus and (iii) can be partially expressed in a language that can be compiled by a graphics programming tool. In this talk we describe our experiences implementing a complete, but relatively simple, time dependent shallow-water equations simulation targeting a cluster of 30 computers each hosting one graphics card. The implementation takes into account the considerations (i), (ii) and (iii) listed previously. We code our algorithm as a series of numerical kernels. Each kernel is designed to be executed by multiple threads of a single process. Kernels are passed memory blocks to compute over which can be persistent blocks of memory on a graphics card. Each kernel is individually implemented using the NVidia CUDA language but driven from a higher level supervisory code that is almost identical to a standard model driver. The supervisory code controls the overall simulation timestepping, but is written to minimize data transfer between main memory and graphics memory (a massive performance bottle-neck on current systems). Using the recipe outlined we can boost the performance of our cluster by nearly an order of magnitude, relative to the same algorithm executing only on the cluster CPU's. Achieving this performance boost requires that many threads are available to each graphics processor for execution within each numerical kernel and that the simulations working set of data can fit into the graphics card memory. As we describe, this puts interesting upper and lower bounds on the problem sizes for which this technology is currently most useful. However, many interesting problems fit within this envelope. Looking forward, we extrapolate our experience to estimate full-scale ocean model performance and applicability. Finally we describe preliminary hybrid mixed 32-bit and 64-bit experiments with graphics cards that support 64-bit arithmetic, albeit at a lower performance.

  11. Accelerating functional verification of an integrated circuit

    DOEpatents

    Deindl, Michael; Ruedinger, Jeffrey Joseph; Zoellin, Christian G.

    2015-10-27

    Illustrative embodiments include a method, system, and computer program product for accelerating functional verification in simulation testing of an integrated circuit (IC). Using a processor and a memory, a serial operation is replaced with a direct register access operation, wherein the serial operation is configured to perform bit shifting operation using a register in a simulation of the IC. The serial operation is blocked from manipulating the register in the simulation of the IC. Using the register in the simulation of the IC, the direct register access operation is performed in place of the serial operation.

  12. Speeding up parallel processing

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.

  13. A multicomputer simulation of the Galileo spacecraft command and data subsystem

    NASA Technical Reports Server (NTRS)

    Zipse, John E.; Yeung, Raymond Y.; Zimmerman, Barbara A.; Morillo, Ronald; Olster, Daniel B.; Flower, Jon W.; Mizuo, Thomas

    1991-01-01

    A detailed simulation of the command and data subsystem of the Galileo spacecraft on a distributed memory multicomputer is described. The simulation is based on an ensemble of Inmos Transputers for simulating, to the bit level, the execution of instruction sequences for the six RCA 1802 microcomputers and the intricate bus traffic between them and other components of the spacecraft. Expressions were developed to estimate the performance of the simulator on a distributed system given the processor clock speed, memory access time, and communication characteristics.

  14. Designing a Virtual-Memory Implementation Using the Motorola MC68010 16- Bit Microprocessor with Multi-Processor Capability Interfaced to the VMEbus

    DTIC Science & Technology

    1990-06-01

    RAM and ROM output enable signals. Figure C.7 shows the logic for the interrupt priority level (IPLO* through IPL2 *) and the interrupt acknowledge...IACK681* signal is sent to the DUART when a level one interrupt acknowledge is output by the CPU. The logic for the IACK681* and the IPLO* through IPL2 ...signals are actually implemented with an EPLD. Listing D.4 in Appendix D presents the Abel description of the IACK681* and IPLO* through IPL2

  15. A Versatile Image Processor For Digital Diagnostic Imaging And Its Application In Computed Radiography

    NASA Astrophysics Data System (ADS)

    Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.

    1986-06-01

    In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.

  16. CoNNeCT Baseband Processor Module Boot Code SoftWare (BCSW)

    NASA Technical Reports Server (NTRS)

    Yamamoto, Clifford K.; Orozco, David S.; Byrne, D. J.; Allen, Steven J.; Sahasrabudhe, Adit; Lang, Minh

    2012-01-01

    This software provides essential startup and initialization routines for the CoNNeCT baseband processor module (BPM) hardware upon power-up. A command and data handling (C&DH) interface is provided via 1553 and diagnostic serial interfaces to invoke operational, reconfiguration, and test commands within the code. The BCSW has features unique to the hardware it is responsible for managing. In this case, the CoNNeCT BPM is configured with an updated CPU (Atmel AT697 SPARC processor) and a unique set of memory and I/O peripherals that require customized software to operate. These features include configuration of new AT697 registers, interfacing to a new HouseKeeper with a flash controller interface, a new dual Xilinx configuration/scrub interface, and an updated 1553 remote terminal (RT) core. The BCSW is intended to provide a "safe" mode for the BPM when initially powered on or when an unexpected trap occurs, causing the processor to reset. The BCSW allows the 1553 bus controller in the spacecraft or payload controller to operate the BPM over 1553 to upload code; upload Xilinx bit files; perform rudimentary tests; read, write, and copy the non-volatile flash memory; and configure the Xilinx interface. Commands also exist over 1553 to cause the CPU to jump or call a specified address to begin execution of user-supplied code. This may be in the form of a real-time operating system, test routine, or specific application code to run on the BPM.

  17. Implementation of 4-way Superscalar Hash MIPS Processor Using FPGA

    NASA Astrophysics Data System (ADS)

    Sahib Omran, Safaa; Fouad Jumma, Laith

    2018-05-01

    Due to the quick advancements in the personal communications systems and wireless communications, giving data security has turned into a more essential subject. This security idea turns into a more confounded subject when next-generation system requirements and constant calculation speed are considered in real-time. Hash functions are among the most essential cryptographic primitives and utilized as a part of the many fields of signature authentication and communication integrity. These functions are utilized to acquire a settled size unique fingerprint or hash value of an arbitrary length of message. In this paper, Secure Hash Algorithms (SHA) of types SHA-1, SHA-2 (SHA-224, SHA-256) and SHA-3 (BLAKE) are implemented on Field-Programmable Gate Array (FPGA) in a processor structure. The design is described and implemented using a hardware description language, namely VHSIC “Very High Speed Integrated Circuit” Hardware Description Language (VHDL). Since the logical operation of the hash types of (SHA-1, SHA-224, SHA-256 and SHA-3) are 32-bits, so a Superscalar Hash Microprocessor without Interlocked Pipelines (MIPS) processor are designed with only few instructions that were required in invoking the desired Hash algorithms, when the four types of hash algorithms executed sequentially using the designed processor, the total time required equal to approximately 342 us, with a throughput of 4.8 Mbps while the required to execute the same four hash algorithms using the designed four-way superscalar is reduced to 237 us with improved the throughput to 5.1 Mbps.

  18. Efficient system interrupt concept design at the microprogramming level

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fakharzadeh, M.M.

    1989-01-01

    Over the past decade the demand for high speed super microcomputers has been tremendously increased. To satisfy this demand many high speed 32-bit microcomputers have been designed. However, the currently available 32-bit systems do not provide an adequate solution to many highly demanding problems such as in multitasking, and in interrupt driven applications, which both require context switching. Systems for these purposes usually incorporate sophisticated software. In order to be efficient, a high end microprocessor based system must satisfy stringent software demands. Although these microprocessors use the latest technology in the fabrication design and run at a very high speed,more » they still suffer from insufficient hardware support for such applications. All too often, this lack also is the premier cause of execution inefficiency. In this dissertation a micro-programmable control unit and operation unit is considered in an advanced design. An automaton controller is designed for high speed micro-level interrupt handling. Different stack models are designed for the single task and multitasking environment. The stacks are used for storage of various components of the processor during the interrupt calls, procedure calls, and task switching. A universal (as an example seven port) register file is designed for high speed parameter passing, and intertask communication in the multitasking environment. In addition, the register file provides a direct path between ALU and the peripheral data which is important in real-time control applications. The overall system is a highly parallel architecture, with no pipeline and internal cache memory, which allows the designer to be able to predict the processor's behavior during the critical times.« less

  19. Malleable architecture generator for FPGA computing

    NASA Astrophysics Data System (ADS)

    Gokhale, Maya; Kaba, James; Marks, Aaron; Kim, Jang

    1996-10-01

    The malleable architecture generator (MARGE) is a tool set that translates high-level parallel C to configuration bit streams for field-programmable logic based computing systems. MARGE creates an application-specific instruction set and generates the custom hardware components required to perform exactly those computations specified by the C program. In contrast to traditional fixed-instruction processors, MARGE's dynamic instruction set creation provides for efficient use of hardware resources. MARGE processes intermediate code in which each operation is annotated by the bit lengths of the operands. Each basic block (sequence of straight line code) is mapped into a single custom instruction which contains all the operations and logic inherent in the block. A synthesis phase maps the operations comprising the instructions into register transfer level structural components and control logic which have been optimized to exploit functional parallelism and function unit reuse. As a final stage, commercial technology-specific tools are used to generate configuration bit streams for the desired target hardware. Technology- specific pre-placed, pre-routed macro blocks are utilized to implement as much of the hardware as possible. MARGE currently supports the Xilinx-based Splash-2 reconfigurable accelerator and National Semiconductor's CLAy-based parallel accelerator, MAPA. The MARGE approach has been demonstrated on systolic applications such as DNA sequence comparison.

  20. ISFET-based sensor signal processor chip design for environment monitoring applications

    NASA Astrophysics Data System (ADS)

    Chung, Wen-Yaw; Yang, Chung-Huang; Wang, Ming-Ga

    2004-12-01

    In recent years Ion-Sensitive Field Effect Transistor (ISFET) based transducers create valuable applications in physiological data acquisition and environment monitoring. This paper presents a mixed-mode ASIC design for potentiometric ISFET-based bio-chemical sensor applications including H+ sensing and hand-held pH meter. For battery power consideration, the proposed system consists of low voltage (3V) analog front-end readout circuits and digital processor has been developed and fabricated in a 0.5mm double-poly double-metal CMOS technology. To assure that the correct pH value can be measured, the two-point calibration circuitry based on the response of standard pH4 and pH7 buffer solution has been implemented by using algorithmic state machine hardware algorithms. The measurement accuracy of the chip is 10 bits and the measured range between pH 2 to pH 12 compared to ideal values is within the accuracy of 0.1pH. For homeland environmental applications, the system provide rapid, easy to use, and cost-effective on-site testing on the quality of water, such as drinking water, ground water and river water. The processor has a potential usage in battery-operated and portable devices in environmental monitoring applications compared to commercial hand-held pH meter.

  1. Design specification of an acousto-optic spectrum analyzer that could be used as an auxiliary receiver for CANEWS

    NASA Astrophysics Data System (ADS)

    Studenny, John; Johnstone, Eric

    1991-01-01

    The acousto-optic spectrum analyzer has undergone a theoretical design review and a basic parameter tradeoff analysis has been performed. The main conclusion is that for the given scenario of a 55 dB dynamic range and for a one-second temporal resolution, a 3.9 MHz resolution is a reasonable compromise with respect to current technology. Additional configurations are suggested. Noise testing of the signal detection processor algorithm was conducted. Additive white Gaussian noise was introduced to pure data. As expected, the tradeoff was between algorithm sensitivity and false alarms. No additional algorithm improvements could be made. The algorithm was observed to be robust, provided that the noise floor was set at a proper level. The digitization scheme was mainly driven by hardware constraints. To implement an analog to digital conversion scheme that linearly covers a 55 dB dynamic range would require a minimum of 17 bits. The general consensus was that 17 bits would be untenable for very large scale integration.

  2. Application experience with the NASA aircraft interrogation and display system - A ground-support equipment for digital flight systems

    NASA Technical Reports Server (NTRS)

    Glover, R. D.

    1983-01-01

    The NASA Dryden Flight Research Facility has developed a microprocessor-based, user-programmable, general-purpose aircraft interrogation and display system (AIDS). The hardware and software of this ground-support equipment have been designed to permit diverse applications in support of aircraft digital flight-control systems and simulation facilities. AIDS is often employed to provide engineering-units display of internal digital system parameters during development and qualification testing. Such visibility into the system under test has proved to be a key element in the final qualification testing of aircraft digital flight-control systems. Three first-generation 8-bit units are now in service in support of several research aircraft projects, and user acceptance has been high. A second-generation design, extended AIDS (XAIDS), incorporating multiple 16-bit processors, is now being developed to support the forward swept wing aircraft project (X-29A). This paper outlines the AIDS concept, summarizes AIDS operational experience, and describes the planned XAIDS design and mechanization.

  3. "Observation Obscurer" - Time Series Viewer, Editor and Processor

    NASA Astrophysics Data System (ADS)

    Andronov, I. L.

    The program is described, which contains a set of subroutines suitable for East viewing and interactive filtering and processing of regularly and irregularly spaced time series. Being a 32-bit DOS application, it may be used as a default fast viewer/editor of time series in any compute shell ("commander") or in Windows. It allows to view the data in the "time" or "phase" mode, to remove ("obscure") or filter outstanding bad points; to make scale transformations and smoothing using few methods (e.g. mean with phase binning, determination of the statistically opti- mal number of phase bins; "running parabola" (Andronov, 1997, As. Ap. Suppl, 125, 207) fit and to make time series analysis using some methods, e.g. correlation, autocorrelation and histogram analysis: determination of extrema etc. Some features have been developed specially for variable star observers, e.g. the barycentric correction, the creation and fast analysis of "OC" diagrams etc. The manual for "hot keys" is presented. The computer code was compiled with a 32-bit Free Pascal (www.freepascal.org).

  4. Gigaflop architecture, a hardware perspective

    NASA Technical Reports Server (NTRS)

    Feierbach, G. F.

    1978-01-01

    Any super computer built in the early 1980s will use components that are available by fall 1978. The architecture of such a system cannot depart radically from current super computers if the software experience painfully acquired from these computers in the 70's is to apply. Given the above constraints, 10 billion floating point operations per second (BFLOPS) are attainable and a problem memory of 512 million (64 bit) words could be supported by the technology of the time. In contrast to this, industry is likely to respond with commercially available machines with a performance of less than 150 MFLOPS. This is due to self-imposed constraints on the manufacturers to provide upward compatible architectures (same instruction set) and systems which can be sold in significant volumes. Since this computing speed is inadequate to meet the demands of computational fluid dynamics, a special processor is required. Issues which are felt to be significant in the pursuit of maximum compute capability in this special processor are discussed.

  5. Digital receiver study and implementation

    NASA Technical Reports Server (NTRS)

    Fogle, D. A.; Lee, G. M.; Massey, J. C.

    1972-01-01

    Computer software was developed which makes it possible to use any general purpose computer with A/D conversion capability as a PSK receiver for low data rate telemetry processing. Carrier tracking, bit synchronization, and matched filter detection are all performed digitally. To aid in the implementation of optimum computer processors, a study of general digital processing techniques was performed which emphasized various techniques for digitizing general analog systems. In particular, the phase-locked loop was extensively analyzed as a typical non-linear communication element. Bayesian estimation techniques for PSK demodulation were studied. A hardware implementation of the digital Costas loop was developed.

  6. A DSP equipped digitizer for online analysis of nuclear detector signals

    NASA Astrophysics Data System (ADS)

    Pasquali, G.; Ciaranfi, R.; Bardelli, L.; Bini, M.; Boiano, A.; Giannelli, F.; Ordine, A.; Poggi, G.

    2007-01-01

    In the framework of the NUCL-EX collaboration, a DSP equipped fast digitizer has been implemented and it has now reached the production stage. Each sampling channel is implemented on a separate daughter-board to be plugged on a VME mother-board. Each channel features a 12-bit, 125 MSamples/s ADC and a Digital Signal Processor (DSP) for online analysis of detector signals. A few algorithms have been written and successfully tested on detectors of different types (scintillators, solid-state, gas-filled), implementing pulse shape discrimination, constant fraction timing, semi-Gaussian shaping, gated integration.

  7. Sensor Authentication: Embedded Processor Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Svoboda, John

    2012-09-25

    Described is the c code running on the embedded Microchip 32bit PIC32MX575F256H located on the INL developed noise analysis circuit board. The code performs the following functions: Controls the noise analysis circuit board preamplifier voltage gains of 1, 10, 100, 000 Initializes the analog to digital conversion hardware, input channel selection, Fast Fourier Transform (FFT) function, USB communications interface, and internal memory allocations Initiates high resolution 4096 point 200 kHz data acquisition Computes complex 2048 point FFT and FFT magnitude. Services Host command set Transfers raw data to Host Transfers FFT result to host Communication error checking

  8. Advanced flight computer. Special study

    NASA Technical Reports Server (NTRS)

    Coo, Dennis

    1995-01-01

    This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.

  9. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    NASA Astrophysics Data System (ADS)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.

  10. Design and implementation of projects with Xilinx Zynq FPGA: a practical case

    NASA Astrophysics Data System (ADS)

    Travaglini, R.; D'Antone, I.; Meneghini, S.; Rignanese, L.; Zuffa, M.

    The main advantage when using FPGAs with embedded processors is the availability of additional several high-performance resources in the same physical device. Moreover, the FPGA programmability allows for connect custom peripherals. Xilinx have designed a programmable device named Zynq-7000 (simply called Zynq in the following), which integrates programmable logic (identical to the other Xilinx "serie 7" devices) with a System on Chip (SOC) based on two embedded ARM processors. Since both parts are deeply connected, the designers benefit from performance of hardware SOC and flexibility of programmability as well. In this paper a design developed by the Electronic Design Department at the Bologna Division of INFN will be presented as a practical case of project based on Zynq device. It is developed by using a commercial board called ZedBoard hosting a FMC mezzanine with a 12-bit 500 MS/s ADC. The Zynq FPGA on the ZedBoard receives digital outputs from the ADC and send them to the acquisition PC, after proper formatting, through a Gigabit Ethernet link. The major focus of the paper will be about the methodology to develop a Zynq-based design with the Xilinx Vivado software, enlightening how to configure the SOC and connect it with the programmable logic. Firmware design techniques will be presented: in particular both VHDL and IP core based strategies will be discussed. Further, the procedure to develop software for the embedded processor will be presented. Finally, some debugging tools, like the embedded Logic Analyzer, will be shown. Advantages and disadvantages with respect to adopting FPGA without embedded processors will be discussed.

  11. A bunch to bucket phase detector for the RHIC LLRF upgrade platform

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, K.S.; Harvey, M.; Hayes, T.

    2011-03-28

    As part of the overall development effort for the RHIC LLRF Upgrade Platform [1,2,3], a generic four channel 16 bit Analog-to-Digital Converter (ADC) daughter module was developed to provide high speed, wide dynamic range digitizing and processing of signals from DC to several hundred megahertz. The first operational use of this card was to implement the bunch to bucket phase detector for the RHIC LLRF beam control feedback loops. This paper will describe the design and performance features of this daughter module as a bunch to bucket phase detector, and also provide an overview of its place within the overallmore » LLRF platform architecture as a high performance digitizer and signal processing module suitable to a variety of applications. In modern digital control and signal processing systems, ADCs provide the interface between the analog and digital signal domains. Once digitized, signals are then typically processed using algorithms implemented in field programmable gate array (FPGA) logic, general purpose processors (GPPs), digital signal processors (DSPs) or a combination of these. For the recently developed and commissioned RHIC LLRF Upgrade Platform, we've developed a four channel ADC daughter module based on the Linear Technology LTC2209 16 bit, 160 MSPS ADC and the Xilinx V5FX70T FPGA. The module is designed to be relatively generic in application, and with minimal analog filtering on board, is capable of processing signals from DC to 500 MHz or more. The module's first application was to implement the bunch to bucket phase detector (BTB-PD) for the RHIC LLRF system. The same module also provides DC digitizing of analog processed BPM signals used by the LLRF system for radial feedback.« less

  12. Numerical performance and throughput benchmark for electronic structure calculations in PC-Linux systems with new architectures, updated compilers, and libraries.

    PubMed

    Yu, Jen-Shiang K; Hwang, Jenn-Kang; Tang, Chuan Yi; Yu, Chin-Hui

    2004-01-01

    A number of recently released numerical libraries including Automatically Tuned Linear Algebra Subroutines (ATLAS) library, Intel Math Kernel Library (MKL), GOTO numerical library, and AMD Core Math Library (ACML) for AMD Opteron processors, are linked against the executables of the Gaussian 98 electronic structure calculation package, which is compiled by updated versions of Fortran compilers such as Intel Fortran compiler (ifc/efc) 7.1 and PGI Fortran compiler (pgf77/pgf90) 5.0. The ifc 7.1 delivers about 3% of improvement on 32-bit machines compared to the former version 6.0. Performance improved from pgf77 3.3 to 5.0 is also around 3% when utilizing the original unmodified optimization options of the compiler enclosed in the software. Nevertheless, if extensive compiler tuning options are used, the speed can be further accelerated to about 25%. The performances of these fully optimized numerical libraries are similar. The double-precision floating-point (FP) instruction sets (SSE2) are also functional on AMD Opteron processors operated in 32-bit compilation, and Intel Fortran compiler has performed better optimization. Hardware-level tuning is able to improve memory bandwidth by adjusting the DRAM timing, and the efficiency in the CL2 mode is further accelerated by 2.6% compared to that of the CL2.5 mode. The FP throughput is measured by simultaneous execution of two identical copies of each of the test jobs. Resultant performance impact suggests that IA64 and AMD64 architectures are able to fulfill significantly higher throughput than the IA32, which is consistent with the SpecFPrate2000 benchmarks.

  13. An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

    NASA Astrophysics Data System (ADS)

    Vincenti, H.; Lobet, M.; Lehe, R.; Sasanka, R.; Vay, J.-L.

    2017-01-01

    In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈ 20 pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scatter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of × 2 to × 2.5 speed-up in double precision for particle shape factor of orders 1- 3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles).

  14. GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing

    NASA Astrophysics Data System (ADS)

    Johl, John T.; Baker, Nick C.

    1988-10-01

    The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.

  15. A programmable two-qubit quantum processor in silicon

    NASA Astrophysics Data System (ADS)

    Watson, T. F.; Philips, S. G. J.; Kawakami, E.; Ward, D. R.; Scarlino, P.; Veldhorst, M.; Savage, D. E.; Lagally, M. G.; Friesen, Mark; Coppersmith, S. N.; Eriksson, M. A.; Vandersypen, L. M. K.

    2018-03-01

    Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch–Josza algorithm and the Grover search algorithm—canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85–89 per cent and concurrences of 73–82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.

  16. A programmable two-qubit quantum processor in silicon.

    PubMed

    Watson, T F; Philips, S G J; Kawakami, E; Ward, D R; Scarlino, P; Veldhorst, M; Savage, D E; Lagally, M G; Friesen, Mark; Coppersmith, S N; Eriksson, M A; Vandersypen, L M K

    2018-03-29

    Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch-Josza algorithm and the Grover search algorithm-canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85-89 per cent and concurrences of 73-82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.

  17. The Tera Multithreaded Architecture and Unstructured Meshes

    NASA Technical Reports Server (NTRS)

    Bokhari, Shahid H.; Mavriplis, Dimitri J.

    1998-01-01

    The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.

  18. List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

    NASA Astrophysics Data System (ADS)

    Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

    2014-03-01

    List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.

  19. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    NASA Astrophysics Data System (ADS)

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-08-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.

  20. Application of an onboard processor to the OAO C spacecraft

    NASA Technical Reports Server (NTRS)

    Stewart, W. N.; Hartenstein, R. G.; Trevathan, C.

    1972-01-01

    The design of a stored program computer for spacecraft use and its application on the fourth Orbiting Astronomical Observatory (OAO) is reported. The computer is a medium scale, parallel machine with a memory capacity of 16384 words of 18 bits each. It possesses a comprehensive instruction repertoire and operates on 45 W of power (including the dc-to-dc converter). The machine operates at a 500-kHz rate and executes an add instruction in 10 microseconds. Its primary functions on OAO C will be auxiliary command storage, spacecraft monitoring and malfunction reporting, data compression and status summary, and possible performance of emergency corrective action for certain anomalous situations.

  1. A computational system for lattice QCD with overlap Dirac quarks

    NASA Astrophysics Data System (ADS)

    Chiu, Ting-Wai; Hsieh, Tung-Han; Huang, Chao-Hsi; Huang, Tsung-Ren

    2003-05-01

    We outline the essential features of a Linux PC cluster which is now being developed at National Taiwan University, and discuss how to optimize its hardware and software for lattice QCD with overlap Dirac quarks. At present, the cluster constitutes of 30 nodes, with each node consisting of one Pentium 4 processor (1.6/2.0 GHz), one Gbyte of PC800 RDRAM, one 40/80 Gbyte hard disk, and a network card. The speed of this system is estimated to be 30 Gflops, and its price/performance ratio is better than $1.0/Mflops for 64-bit (double precision) computations in quenched lattice QCD with overlap Dirac quarks.

  2. Apparatus and method for implementing power saving techniques when processing floating point values

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kim, Young Moon; Park, Sang Phill

    An apparatus and method are described for reducing power when reading and writing graphics data. For example, one embodiment of an apparatus comprises: a graphics processor unit (GPU) to process graphics data including floating point data; a set of registers, at least one of the registers of the set partitioned to store the floating point data; and encode/decode logic to reduce a number of binary 1 values being read from the at least one register by causing a specified set of bit positions within the floating point data to be read out as 0s rather than 1s.

  3. Hardware/Software Issues for Video Guidance Systems: The Coreco Frame Grabber

    NASA Technical Reports Server (NTRS)

    Bales, John W.

    1996-01-01

    The F64 frame grabber is a high performance video image acquisition and processing board utilizing the TMS320C40 and TMS34020 processors. The hardware is designed for the ISA 16 bit bus and supports multiple digital or analog cameras. It has an acquisition rate of 40 million pixels per second, with a variable sampling frequency of 510 kHz to MO MHz. The board has a 4MB frame buffer memory expandable to 32 MB, and has a simultaneous acquisition and processing capability. It supports both VGA and RGB displays, and accepts all analog and digital video input standards.

  4. Optimizing TLB entries for mixed page size storage in contiguous memory

    DOEpatents

    Chen, Dong; Gara, Alan; Giampapa, Mark E.; Heidelberger, Philip; Kriegel, Jon K.; Ohmacht, Martin; Steinmacher-Burow, Burkhard

    2013-04-30

    A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.

  5. Memory Network For Distributed Data Processors

    NASA Technical Reports Server (NTRS)

    Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George

    1992-01-01

    Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.

  6. Experiments with a small behaviour controlled planetary rover

    NASA Technical Reports Server (NTRS)

    Miller, David P.; Desai, Rajiv S.; Gat, Erann; Ivlev, Robert; Loch, John

    1993-01-01

    A series of experiments that were performed on the Rocky 3 robot is described. Rocky 3 is a small autonomous rover capable of navigating through rough outdoor terrain to a predesignated area, searching that area for soft soil, acquiring a soil sample, and depositing the sample in a container at its home base. The robot is programmed according to a reactive behavior control paradigm using the ALFA programming language. This style of programming produces robust autonomous performance while requiring significantly less computational resources than more traditional mobile robot control systems. The code for Rocky 3 runs on an eight bit processor and uses about ten k of memory.

  7. Chip-integrated optical power limiter based on an all-passive micro-ring resonator

    NASA Astrophysics Data System (ADS)

    Yan, Siqi; Dong, Jianji; Zheng, Aoling; Zhang, Xinliang

    2014-10-01

    Recent progress in silicon nanophotonics has dramatically advanced the possible realization of large-scale on-chip optical interconnects integration. Adopting photons as information carriers can break the performance bottleneck of electronic integrated circuit such as serious thermal losses and poor process rates. However, in integrated photonics circuits, few reported work can impose an upper limit of optical power therefore prevent the optical device from harm caused by high power. In this study, we experimentally demonstrate a feasible integrated scheme based on a single all-passive micro-ring resonator to realize the optical power limitation which has a similar function of current limiting circuit in electronics. Besides, we analyze the performance of optical power limiter at various signal bit rates. The results show that the proposed device can limit the signal power effectively at a bit rate up to 20 Gbit/s without deteriorating the signal. Meanwhile, this ultra-compact silicon device can be completely compatible with the electronic technology (typically complementary metal-oxide semiconductor technology), which may pave the way of very large scale integrated photonic circuits for all-optical information processors and artificial intelligence systems.

  8. Dolphin Sounds-Inspired Covert Underwater Acoustic Communication and Micro-Modem

    PubMed Central

    Qiao, Gang; Liu, Songzuo; Bilal, Muhammad

    2017-01-01

    A novel portable underwater acoustic modem is proposed in this paper for covert communication between divers or underwater unmanned vehicles (UUVs) and divers at a short distance. For the first time, real dolphin calls are used in the modem to realize biologically inspired Covert Underwater Acoustic Communication (CUAC). A variety of dolphin whistles and clicks stored in an SD card inside the modem helps to realize different biomimetic CUAC algorithms based on the specified covert scenario. In this paper, the information is conveyed during the time interval between dolphin clicks. TMS320C6748 and TLV320AIC3106 are the core processors used in our unique modem for fast digital processing and interconnection with other terminals or sensors. Simulation results show that the bit error rate (BER) of the CUAC algorithm is less than 10−5 when the signal to noise ratio is over ‒5 dB. The modem was tested in an underwater pool, and a data rate of 27.1 bits per second at a distance of 10 m was achieved. PMID:29068363

  9. Solving Coupled Gross--Pitaevskii Equations on a Cluster of PlayStation 3 Computers

    NASA Astrophysics Data System (ADS)

    Edwards, Mark; Heward, Jeffrey; Clark, C. W.

    2009-05-01

    At Georgia Southern University we have constructed an 8+1--node cluster of Sony PlayStation 3 (PS3) computers with the intention of using this computing resource to solve problems related to the behavior of ultra--cold atoms in general with a particular emphasis on studying bose--bose and bose--fermi mixtures confined in optical lattices. As a first project that uses this computing resource, we have implemented a parallel solver of the coupled time--dependent, one--dimensional Gross--Pitaevskii (TDGP) equations. These equations govern the behavior of dual-- species bosonic mixtures. We chose the split--operator/FFT to solve the coupled 1D TDGP equations. The fast Fourier transform component of this solver can be readily parallelized on the PS3 cpu known as the Cell Broadband Engine (CellBE). Each CellBE chip contains a single 64--bit PowerPC Processor Element known as the PPE and eight ``Synergistic Processor Element'' identified as the SPE's. We report on this algorithm and compare its performance to a non--parallel solver as applied to modeling evaporative cooling in dual--species bosonic mixtures.

  10. Image processing using Gallium Arsenide (GaAs) technology

    NASA Technical Reports Server (NTRS)

    Miller, Warner H.

    1989-01-01

    The need to increase the information return from space-borne imaging systems has increased in the past decade. The use of multi-spectral data has resulted in the need for finer spatial resolution and greater spectral coverage. Onboard signal processing will be necessary in order to utilize the available Tracking and Data Relay Satellite System (TDRSS) communication channel at high efficiency. A generally recognized approach to the increased efficiency of channel usage is through data compression techniques. The compression technique implemented is a differential pulse code modulation (DPCM) scheme with a non-uniform quantizer. The need to advance the state-of-the-art of onboard processing was recognized and a GaAs integrated circuit technology was chosen. An Adaptive Programmable Processor (APP) chip set was developed which is based on an 8-bit slice general processor. The reason for choosing the compression technique for the Multi-spectral Linear Array (MLA) instrument is described. Also a description is given of the GaAs integrated circuit chip set which will demonstrate that data compression can be performed onboard in real time at data rate in the order of 500 Mb/s.

  11. Design and implementation of the modified signed digit multiplication routine on a ternary optical computer.

    PubMed

    Xu, Qun; Wang, Xianchao; Xu, Chao

    2017-06-01

    Multiplication with traditional electronic computers is faced with a low calculating accuracy and a long computation time delay. To overcome these problems, the modified signed digit (MSD) multiplication routine is established based on the MSD system and the carry-free adder. Also, its parallel algorithm and optimization techniques are studied in detail. With the help of a ternary optical computer's characteristics, the structured data processor is designed especially for the multiplication routine. Several ternary optical operators are constructed to perform M transformations and summations in parallel, which has accelerated the iterative process of multiplication. In particular, the routine allocates data bits of the ternary optical processor based on digits of multiplication input, so the accuracy of the calculation results can always satisfy the users. Finally, the routine is verified by simulation experiments, and the results are in full compliance with the expectations. Compared with an electronic computer, the MSD multiplication routine is not only good at dealing with large-value data and high-precision arithmetic, but also maintains lower power consumption and fewer calculating delays.

  12. Implementation and analysis of a Navier-Stokes algorithm on parallel computers

    NASA Technical Reports Server (NTRS)

    Fatoohi, Raad A.; Grosch, Chester E.

    1988-01-01

    The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.

  13. Multi-processor including data flow accelerator module

    DOEpatents

    Davidson, George S.; Pierce, Paul E.

    1990-01-01

    An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.

  14. The Cortex Transform as an image preprocessor for sparse distributed memory: An initial study

    NASA Technical Reports Server (NTRS)

    Olshausen, Bruno; Watson, Andrew

    1990-01-01

    An experiment is described which was designed to evaluate the use of the Cortex Transform as an image processor for Sparse Distributed Memory (SDM). In the experiment, a set of images were injected with Gaussian noise, preprocessed with the Cortex Transform, and then encoded into bit patterns. The various spatial frequency bands of the Cortex Transform were encoded separately so that they could be evaluated based on their ability to properly cluster patterns belonging to the same class. The results of this study indicate that by simply encoding the low pass band of the Cortex Transform, a very suitable input representation for the SDM can be achieved.

  15. High-Speed Micro Signal Processor.

    DTIC Science & Technology

    1982-06-01

    4) 29 S1424-35 ADOR/QUANTITY FIELD - 2048 TO +409510, NOTE: POSITIVE VALUES 2047 4AP TO NEGKTIVE VALUES SK436 HALT 0 E ENABLE HALT 1 D DISABLE SM37 38...0 to 31 numeric. QUANT - 12 bit Quantity field to be used as the ALU ’Am operand. Value - (- 2048 to +2047)10 numeric. If (ASEL - Q16E) then, one...15AI12 1 9 rS EAO 1 41 1A!)3 I 10 TSEBO I 42 DTII1 I 11 TSEBI I 43 DTTIO T 12 VDI 1 44 )1109 1 13 DTAOO 0 45 DTI08 I 14 DTBOO 0 46 DTI07 I 15 DTBO1 0

  16. Optical reversible programmable Boolean logic unit.

    PubMed

    Chattopadhyay, Tanay

    2012-07-20

    Computing with reversibility is the only way to avoid dissipation of energy associated with bit erase. So, a reversible microprocessor is required for future computing. In this paper, a design of a simple all-optical reversible programmable processor is proposed using a polarizing beam splitter, liquid crystal-phase spatial light modulators, a half-wave plate, and plane mirrors. This circuit can perform 16 logical operations according to three programming inputs. Also, inputs can be easily recovered from the outputs. It is named the "reversible programmable Boolean logic unit (RPBLU)." The logic unit is the basic building block of many complex computational operations. Hence the design is important in sense. Two orthogonally polarized lights are defined here as two logical states, respectively.

  17. The development of a specialized processor for a space-based multispectral earth imager

    NASA Astrophysics Data System (ADS)

    Khedr, Mostafa E.

    2008-10-01

    This work was done in the Department of Computer Engineering, Lvov Polytechnic National University, Lvov, Ukraine, as a thesis entitled "Space Imager Computer System for Raw Video Data Processing" [1]. This work describes the synthesis and practical implementation of a specialized computer system for raw data control and processing onboard a satellite MultiSpectral earth imager. This computer system is intended for satellites with resolution in the range of one meter with 12-bit precession. The design is based mostly on general off-the-shelf components such as (FPGAs) plus custom designed software for interfacing with PC and test equipment. The designed system was successfully manufactured and now fully functioning in orbit.

  18. Learning the Art of Electronics

    NASA Astrophysics Data System (ADS)

    Hayes, Thomas C.; Horowitz, Paul

    2016-03-01

    1. DC circuits; 2. RC circuits; 3. Diode circuits; 4. Transistors I; 5. Transistors II; 6. Operational amplifiers I; 7. Operational amplifiers II: nice positive feedback; 8. Operational amplifiers III; 9. Operational amplifiers IV: nasty positive feedback; 10. Operational amplifiers V: PID motor control loop; 11. Voltage regulators; 12. MOSFET switches; 13. Group audio project; 14. Logic gates; 15. Logic compilers, sequential circuits, flip-flops; 16. Counters; 17. Memory: state machines; 18. Analog to digital: phase-locked loop; 19. Microcontrollers and microprocessors I: processor/controller; 20. I/O, first assembly language; 21. Bit operations; 22. Interrupt: ADC and DAC; 23. Moving pointers, serial buses; 24. Dallas Standalone Micro, SiLabs SPI RAM; 25. Toys in the attic; Appendices; Index.

  19. Design of a modular digital computer system, CDRL no. D001, final design plan

    NASA Technical Reports Server (NTRS)

    Easton, R. A.

    1975-01-01

    The engineering breadboard implementation for the CDRL no. D001 modular digital computer system developed during design of the logic system was documented. This effort followed the architecture study completed and documented previously, and was intended to verify the concepts of a fault tolerant, automatically reconfigurable, modular version of the computer system conceived during the architecture study. The system has a microprogrammed 32 bit word length, general register architecture and an instruction set consisting of a subset of the IBM System 360 instruction set plus additional fault tolerance firmware. The following areas were covered: breadboard packaging, central control element, central processing element, memory, input/output processor, and maintenance/status panel and electronics.

  20. A digitally implemented preambleless demodulator for maritime and mobile data communications

    NASA Astrophysics Data System (ADS)

    Chalmers, Harvey; Shenoy, Ajit; Verahrami, Farhad B.

    The hardware design and software algorithms for a low-bit-rate, low-cost, all-digital preambleless demodulator are described. The demodulator operates under severe high-noise conditions, fast Doppler frequency shifts, large frequency offsets, and multipath fading. Sophisticated algorithms, including a fast Fourier transform (FFT)-based burst acquisition algorithm, a cycle-slip resistant carrier phase tracker, an innovative Doppler tracker, and a fast acquisition symbol synchronizer, were developed and extensively simulated for reliable burst reception. The compact digital signal processor (DSP)-based demodulator hardware uses a unique personal computer test interface for downloading test data files. The demodulator test results demonstrate a near-ideal performance within 0.2 dB of theory.

  1. An Early Quantum Computing Proposal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Stephen Russell; Alexander, Francis Joseph; Barros, Kipton Marcos

    The D-Wave 2X is the third generation of quantum processing created by D-Wave. NASA (with Google and USRA) and Lockheed Martin (with USC), both own D-Wave systems. Los Alamos National Laboratory (LANL) purchased a D-Wave 2X in November 2015. The D-Wave 2X processor contains (nominally) 1152 quantum bits (or qubits) and is designed to specifically perform quantum annealing, which is a well-known method for finding a global minimum of an optimization problem. This methodology is based on direct execution of a quantum evolution in experimental quantum hardware. While this can be a powerful method for solving particular kinds of problems,more » it also means that the D-Wave 2X processor is not a general computing processor and cannot be programmed to perform a wide variety of tasks. It is a highly specialized processor, well beyond what NNSA currently thinks of as an “advanced architecture.”A D-Wave is best described as a quantum optimizer. That is, it uses quantum superposition to find the lowest energy state of a system by repeated doses of power and settling stages. The D-Wave produces multiple solutions to any suitably formulated problem, one of which is the lowest energy state solution (global minimum). Mapping problems onto the D-Wave requires defining an objective function to be minimized and then encoding that function in the Hamiltonian of the D-Wave system. The quantum annealing method is then used to find the lowest energy configuration of the Hamiltonian using the current D-Wave Two, two-level, quantum processor. This is not always an easy thing to do, and the D-Wave Two has significant limitations that restrict problem sizes that can be run and algorithmic choices that can be made. Furthermore, as more people are exploring this technology, it has become clear that it is very difficult to come up with general approaches to optimization that can both utilize the D-Wave and that can do better than highly developed algorithms on conventional computers for specific applications. These are all fundamental challenges that must be overcome for the D-Wave, or similar, quantum computing technology to be broadly applicable.« less

  2. AM06: the Associative Memory chip for the Fast TracKer in the upgraded ATLAS detector

    NASA Astrophysics Data System (ADS)

    Annovi, A.; Beretta, M. M.; Calderini, G.; Crescioli, F.; Frontini, L.; Liberali, V.; Shojaii, S. R.; Stabile, A.

    2017-04-01

    This paper describes the AM06 chip, which is a highly parallel processor for pattern recognition in the ATLAS high energy physics experiment. The AM06 contains memory banks that store data organized in 18 bit words; a group of 8 words is called "pattern". Each AM06 chip can store up to 131 072 patterns. The AM06 is a large chip, designed in 65 nm CMOS, and it combines full-custom memory arrays, standard logic cells and serializer/deserializer IP blocks at 2 Gbit/s for input/output communication. The overall silicon area is 168 mm2 and the chip contains about 421 million transistors. The AM06 receives the detector data for each event accepted by Level-1 trigger, up to 100 kHz, and it performs a track reconstruction based on hit information from channels of the ATLAS silicon detectors. Thanks to the design of a new associative memory cell and to the layout optimization, the AM06 consumption is only about 1 fJ/bit per comparison. The AM06 has been fabricated and successfully tested with a dedicated test system.

  3. A thin-film microprocessor with inkjet print-programmable memory

    NASA Astrophysics Data System (ADS)

    Myny, Kris; Smout, Steve; Rockelé, Maarten; Bhoolokam, Ajay; Ke, Tung Huei; Steudel, Soeren; Cobb, Brian; Gulati, Aashini; Rodriguez, Francisco Gonzalez; Obata, Koji; Marinkovic, Marko; Pham, Duy-Vu; Hoppe, Arne; Gelinck, Gerwin H.; Genoe, Jan; Dehaene, Wim; Heremans, Paul

    2014-12-01

    The Internet of Things is driving extensive efforts to develop intelligent everyday objects. This requires seamless integration of relatively simple electronics, for example through `stick-on' electronics labels. We believe the future evolution of this technology will be governed by Wright's Law, which was first proposed in 1936 and states that the cost of a product decreases with cumulative production. This implies that a generic electronic device that can be tailored for application-specific requirements during downstream integration would be a cornerstone in the development of the Internet of Things. We present an 8-bit thin-film microprocessor with a write-once, read-many (WORM) instruction generator that can be programmed after manufacture via inkjet printing. The processor combines organic p-type and soluble oxide n-type thin-film transistors in a new flavor of the familiar complementary transistor technology with the potential to be manufactured on a very thin polyimide film, enabling low-cost flexible electronics. It operates at 6.5 V and reaches clock frequencies up to 2.1 kHz. An instruction set of 16 code lines, each line providing a 9 bit instruction, is defined by means of inkjet printing of conductive silver inks.

  4. Augmented burst-error correction for UNICON laser memory. [digital memory

    NASA Technical Reports Server (NTRS)

    Lim, R. S.

    1974-01-01

    A single-burst-error correction system is described for data stored in the UNICON laser memory. In the proposed system, a long fire code with code length n greater than 16,768 bits was used as an outer code to augment an existing inner shorter fire code for burst error corrections. The inner fire code is a (80,64) code shortened from the (630,614) code, and it is used to correct a single-burst-error on a per-word basis with burst length b less than or equal to 6. The outer code, with b less than or equal to 12, would be used to correct a single-burst-error on a per-page basis, where a page consists of 512 32-bit words. In the proposed system, the encoding and error detection processes are implemented by hardware. A minicomputer, currently used as a UNICON memory management processor, is used on a time-demanding basis for error correction. Based upon existing error statistics, this combination of an inner code and an outer code would enable the UNICON system to obtain a very low error rate in spite of flaws affecting the recorded data.

  5. Energy challenges in optical access and aggregation networks.

    PubMed

    Kilper, Daniel C; Rastegarfar, Houman

    2016-03-06

    Scalability is a critical issue for access and aggregation networks as they must support the growth in both the size of data capacity demands and the multiplicity of access points. The number of connected devices, the Internet of Things, is growing to the tens of billions. Prevailing communication paradigms are reaching physical limitations that make continued growth problematic. Challenges are emerging in electronic and optical systems and energy increasingly plays a central role. With the spectral efficiency of optical systems approaching the Shannon limit, increasing parallelism is required to support higher capacities. For electronic systems, as the density and speed increases, the total system energy, thermal density and energy per bit are moving into regimes that become impractical to support-for example requiring single-chip processor powers above the 100 W limit common today. We examine communication network scaling and energy use from the Internet core down to the computer processor core and consider implications for optical networks. Optical switching in data centres is identified as a potential model from which scalable access and aggregation networks for the future Internet, with the application of integrated photonic devices and intelligent hybrid networking, will emerge. © 2016 The Author(s).

  6. A configurable and low-power mixed signal SoC for portable ECG monitoring applications.

    PubMed

    Kim, Hyejung; Kim, Sunyoung; Van Helleputte, Nick; Artes, Antonio; Konijnenburg, Mario; Huisken, Jos; Van Hoof, Chris; Yazicioglu, Refet Firat

    2014-04-01

    This paper describes a mixed-signal ECG System-on-Chip (SoC) that is capable of implementing configurable functionality with low-power consumption for portable ECG monitoring applications. A low-voltage and high performance analog front-end extracts 3-channel ECG signals and single channel electrode-tissue-impedance (ETI) measurement with high signal quality. This can be used to evaluate the quality of the ECG measurement and to filter motion artifacts. A custom digital signal processor consisting of 4-way SIMD processor provides the configurability and advanced functionality like motion artifact removal and R peak detection. A built-in 12-bit analog-to-digital converter (ADC) is capable of adaptive sampling achieving a compression ratio of up to 7, and loop buffer integration reduces the power consumption for on-chip memory access. The SoC is implemented in 0.18 μm CMOS process and consumes 32 μ W from a 1.2 V while heart beat detection application is running, and integrated in a wireless ECG monitoring system with Bluetooth protocol. Thanks to the ECG SoC, the overall system power consumption can be reduced significantly.

  7. Earth sensing: from ice to the Internet of Things

    NASA Astrophysics Data System (ADS)

    Martinez, K.

    2017-12-01

    The evolution of technology has led to improvements in our ability to use sensors for earth science research. Radio communications have improved in terms of range and power use. Miniaturisation means we now use 32 bit processors with embedded memory, storage and interfaces. Sensor technology makes it simpler to integrate devices such as accelerometers, compasses, gas and biosensors. Programming languages have developed so that it has become easier to create software for these systems. This combined with the power of the processors has made research into advanced algorithms and communications feasible. The term environmental sensor networks describes these advanced systems which are designed specifically to take sensor measurements in the natural environment. Through a decade of research into sensor networks, deployed mainly in glaciers, many areas of this still emerging technology have been explored. From deploying the first subglacial sensor probes with custom electronics and protocols we learnt tuning to harsh environments and energy management. More recently installing sensor systems in the mountains of Scotland has shown that standards have allowed complete internet and web integration. This talk will discuss the technologies used in a range of recent deployments in Scotland and Iceland focussed on creating new data streams for cryospheric and climate change research.

  8. Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2

    PubMed Central

    Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.

    2014-01-01

    SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745

  9. Digital Intermediate Frequency Receiver Module For Use In Airborne Sar Applications

    DOEpatents

    Tise, Bertice L.; Dubbert, Dale F.

    2005-03-08

    A digital IF receiver (DRX) module directly compatible with advanced radar systems such as synthetic aperture radar (SAR) systems. The DRX can combine a 1 G-Sample/sec 8-bit ADC with high-speed digital signal processor, such as high gate-count FPGA technology or ASICs to realize a wideband IF receiver. DSP operations implemented in the DRX can include quadrature demodulation and multi-rate, variable-bandwidth IF filtering. Pulse-to-pulse (Doppler domain) filtering can also be implemented in the form of a presummer (accumulator) and an azimuth prefilter. An out of band noise source can be employed to provide a dither signal to the ADC, and later be removed by digital signal processing. Both the range and Doppler domain filtering operations can be implemented using a unique pane architecture which allows on-the-fly selection of the filter decimation factor, and hence, the filter bandwidth. The DRX module can include a standard VME-64 interface for control, status, and programming. An interface can provide phase history data to the real-time image formation processors. A third front-panel data port (FPDP) interface can send wide bandwidth, raw phase histories to a real-time phase history recorder for ground processing.

  10. Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture

    PubMed Central

    Guzmán, Pablo; Díaz, Javier; Agís, Rodrigo; Ros, Eduardo

    2010-01-01

    The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains. PMID:22319283

  11. Parallel point-multiplication architecture using combined group operations for high-speed cryptographic applications

    PubMed Central

    Saeedi, Ehsan; Kong, Yinan

    2017-01-01

    In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM), which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA) architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST). The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance (1Area×Time=1AT) and Area × Time × Energy (ATE) product of the proposed design are far better than the most significant studies found in the literature. PMID:28459831

  12. Parallel point-multiplication architecture using combined group operations for high-speed cryptographic applications.

    PubMed

    Hossain, Md Selim; Saeedi, Ehsan; Kong, Yinan

    2017-01-01

    In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM), which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA) architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST). The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance ([Formula: see text]) and Area × Time × Energy (ATE) product of the proposed design are far better than the most significant studies found in the literature.

  13. Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

    PubMed

    Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

    2010-01-11

    Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.

  14. Parallel design patterns for a low-power, software-defined compressed video encoder

    NASA Astrophysics Data System (ADS)

    Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

    2011-06-01

    Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.

  15. A Versatile Multichannel Digital Signal Processing Module for Microcalorimeter Arrays

    NASA Astrophysics Data System (ADS)

    Tan, H.; Collins, J. W.; Walby, M.; Hennig, W.; Warburton, W. K.; Grudberg, P.

    2012-06-01

    Different techniques have been developed for reading out microcalorimeter sensor arrays: individual outputs for small arrays, and time-division or frequency-division or code-division multiplexing for large arrays. Typically, raw waveform data are first read out from the arrays using one of these techniques and then stored on computer hard drives for offline optimum filtering, leading not only to requirements for large storage space but also limitations on achievable count rate. Thus, a read-out module that is capable of processing microcalorimeter signals in real time will be highly desirable. We have developed multichannel digital signal processing electronics that are capable of on-board, real time processing of microcalorimeter sensor signals from multiplexed or individual pixel arrays. It is a 3U PXI module consisting of a standardized core processor board and a set of daughter boards. Each daughter board is designed to interface a specific type of microcalorimeter array to the core processor. The combination of the standardized core plus this set of easily designed and modified daughter boards results in a versatile data acquisition module that not only can easily expand to future detector systems, but is also low cost. In this paper, we first present the core processor/daughter board architecture, and then report the performance of an 8-channel daughter board, which digitizes individual pixel outputs at 1 MSPS with 16-bit precision. We will also introduce a time-division multiplexing type daughter board, which takes in time-division multiplexing signals through fiber-optic cables and then processes the digital signals to generate energy spectra in real time.

  16. The advanced receiver 2: Telemetry test results in CTA 21

    NASA Technical Reports Server (NTRS)

    Hinedi, S.; Bevan, R.; Marina, M.

    1991-01-01

    Telemetry tests with the Advanced Receiver II (ARX II) in Compatibility Test Area 21 are described. The ARX II was operated in parallel with a Block-III Receiver/baseband processor assembly combination (BLK-III/BPA) and a Block III Receiver/subcarrier demodulation assembly/symbol synchronization assembly combination (BLK-III/SDA/SSA). The telemetry simulator assembly provided the test signal for all three configurations, and the symbol signal to noise ratio as well as the symbol error rates were measured and compared. Furthermore, bit error rates were also measured by the system performance test computer for all three systems. Results indicate that the ARX-II telemetry performance is comparable and sometimes superior to the BLK-III/BPA and BLK-III/SDA/SSA combinations.

  17. Two dimensional recursive digital filters for near real time image processing

    NASA Technical Reports Server (NTRS)

    Olson, D.; Sherrod, E.

    1980-01-01

    A program was designed toward the demonstration of the feasibility of using two dimensional recursive digital filters for subjective image processing applications that require rapid turn around. The concept of the use of a dedicated minicomputer for the processor for this application was demonstrated. The minicomputer used was the HP1000 series E with a RTE 2 disc operating system and 32K words of memory. A Grinnel 256 x 512 x 8 bit display system was used to display the images. Sample images were provided by NASA Goddard on a 800 BPI, 9 track tape. Four 512 x 512 images representing 4 spectral regions of the same scene were provided. These images were filtered with enhancement filters developed during this effort.

  18. Auto covariance computer

    NASA Technical Reports Server (NTRS)

    Hepner, T. E.; Meyers, J. F. (Inventor)

    1985-01-01

    A laser velocimeter covariance processor which calculates the auto covariance and cross covariance functions for a turbulent flow field based on Poisson sampled measurements in time from a laser velocimeter is described. The device will process a block of data that is up to 4096 data points in length and return a 512 point covariance function with 48-bit resolution along with a 512 point histogram of the interarrival times which is used to normalize the covariance function. The device is designed to interface and be controlled by a minicomputer from which the data is received and the results returned. A typical 4096 point computation takes approximately 1.5 seconds to receive the data, compute the covariance function, and return the results to the computer.

  19. Investigation of Modularly Configured Attached Processors with Intelligent Memories

    DTIC Science & Technology

    1994-09-30

    sequence of computations becomes cl = c11 + C[i, i] := Cli, j] + A[i, k) * Blk, j]; al x bti, c12 - c12 + k X bkX2, C13 = C1 3 + alk X b 3 , ... , c =n...Cin + alk X bkn, C2 1 = Cat + a2k x bkl , The sequence of computations becomes cii = a I x bli, C22 = cU2 + a2k x b2, ... Cj a2 1x bij, ca3j = aa x...1 to 1/2 Least significant 1/2 address bits Fig. 1. Block diagram of a memory module PREGVJKE is v I pis P16 16 ItO16 = 4-81T OECD - = AMKSS A s-Byrf

  20. Speech coding at 4800 bps for mobile satellite communications

    NASA Technical Reports Server (NTRS)

    Gersho, Allen; Chan, Wai-Yip; Davidson, Grant; Chen, Juin-Hwey; Yong, Mei

    1988-01-01

    A speech compression project has recently been completed to develop a speech coding algorithm suitable for operation in a mobile satellite environment aimed at providing telephone quality natural speech at 4.8 kbps. The work has resulted in two alternative techniques which achieve reasonably good communications quality at 4.8 kbps while tolerating vehicle noise and rather severe channel impairments. The algorithms are embodied in a compact self-contained prototype consisting of two AT and T 32-bit floating-point DSP32 digital signal processors (DSP). A Motorola 68HC11 microcomputer chip serves as the board controller and interface handler. On a wirewrapped card, the prototype's circuit footprint amounts to only 200 sq cm, and consumes about 9 watts of power.

  1. Overview of the DART project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, K.R.; Hansen, F.R.; Napolitano, L.M.

    1992-01-01

    DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate ( C'' or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability bymore » using DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less

  2. Overview of the DART project

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, K.R.; Hansen, F.R.; Napolitano, L.M.

    1992-01-01

    DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate (``C`` or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability by usingmore » DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less

  3. A Study of Communication Processor Systems

    DTIC Science & Technology

    1979-12-01

    STUDYSCOFONLA (A . TIS R I8TIO S" - EEG R AN T-i R $a t 17. DITR UTO STTEeN in h b’*l.elldi 1Ok 0 ~45ft i9~a R,1OI)7 -C S. APeMr’wl NOTES Uh a DC gt~ & C...MM Fig. 3.3 Pluribus syste m II 24 -4~ I tm ap B u s- ----I FP K K Cmi Cornrnjer MeodUlO Fig. 3.4 Carnegie "iellLfn CmI* 25 0I M r K K P M K KP M K...144 bits incurs a mean transmission delay (o ) of 0.29 milli- seconds. The maximum transmission delays TDF and TDC can be obtained as 15 milliseconds

  4. An integrated system for multichannel neuronal recording with spike/LFP separation, integrated A/D conversion and threshold detection.

    PubMed

    Perelman, Yevgeny; Ginosar, Ran

    2007-01-01

    A mixed-signal front-end processor for multichannel neuronal recording is described. It receives 12 differential-input channels of implanted recording electrodes. A programmable cutoff High Pass Filter (HPF) blocks dc and low-frequency input drift at about 1 Hz. The signals are band-split at about 200 Hz to low-frequency Local Field Potential (LFP) and high-frequency spike data (SPK), which is band limited by a programmable-cutoff LPF, in a range of 8-13 kHz. Amplifier offsets are compensated by 5-bit calibration digital-to-analog converters (DACs). The SPK and LFP channels provide variable amplification rates of up to 5000 and 500, respectively. The analog signals are converted into 10-bit digital form, and streamed out over a serial digital bus at up to 8 Mbps. A threshold filter suppresses inactive portions of the signal and emits only spike segments of programmable length. A prototype has been fabricated on a 0.35-microm CMOS process and tested successfully, demonstrating a 3-microV noise level. Special interface system incorporating an embedded CPU core in a programmable logic device accompanied by real-time software has been developed to allow connectivity to a computer host.

  5. A reconfigurable cryogenic platform for the classical control of quantum processors

    NASA Astrophysics Data System (ADS)

    Homulle, Harald; Visser, Stefan; Patra, Bishnu; Ferrari, Giorgio; Prati, Enrico; Sebastiano, Fabio; Charbon, Edoardo

    2017-04-01

    The implementation of a classical control infrastructure for large-scale quantum computers is challenging due to the need for integration and processing time, which is constrained by coherence time. We propose a cryogenic reconfigurable platform as the heart of the control infrastructure implementing the digital error-correction control loop. The platform is implemented on a field-programmable gate array (FPGA) that supports the functionality required by several qubit technologies and that can operate close to the physical qubits over a temperature range from 4 K to 300 K. This work focuses on the extensive characterization of the electronic platform over this temperature range. All major FPGA building blocks (such as look-up tables (LUTs), carry chains (CARRY4), mixed-mode clock manager (MMCM), phase-locked loop (PLL), block random access memory, and IDELAY2 (programmable delay element)) operate correctly and the logic speed is very stable. The logic speed of LUTs and CARRY4 changes less then 5%, whereas the jitter of MMCM and PLL clock managers is reduced by 20%. The stability is finally demonstrated by operating an integrated 1.2 GSa/s analog-to-digital converter (ADC) with a relatively stable performance over temperature. The ADCs effective number of bits drops from 6 to 4.5 bits when operating at 15 K.

  6. A reconfigurable cryogenic platform for the classical control of quantum processors.

    PubMed

    Homulle, Harald; Visser, Stefan; Patra, Bishnu; Ferrari, Giorgio; Prati, Enrico; Sebastiano, Fabio; Charbon, Edoardo

    2017-04-01

    The implementation of a classical control infrastructure for large-scale quantum computers is challenging due to the need for integration and processing time, which is constrained by coherence time. We propose a cryogenic reconfigurable platform as the heart of the control infrastructure implementing the digital error-correction control loop. The platform is implemented on a field-programmable gate array (FPGA) that supports the functionality required by several qubit technologies and that can operate close to the physical qubits over a temperature range from 4 K to 300 K. This work focuses on the extensive characterization of the electronic platform over this temperature range. All major FPGA building blocks (such as look-up tables (LUTs), carry chains (CARRY4), mixed-mode clock manager (MMCM), phase-locked loop (PLL), block random access memory, and IDELAY2 (programmable delay element)) operate correctly and the logic speed is very stable. The logic speed of LUTs and CARRY4 changes less then 5%, whereas the jitter of MMCM and PLL clock managers is reduced by 20%. The stability is finally demonstrated by operating an integrated 1.2 GSa/s analog-to-digital converter (ADC) with a relatively stable performance over temperature. The ADCs effective number of bits drops from 6 to 4.5 bits when operating at 15 K.

  7. The construction of the NEAs potential motion domains in vicinity of the 1/2 resonance with Earth in parallel programming environment. (Russian Title: Построение в среде параллельных вычислений областей возможных движений АСЗ, находящихся в окрестности резонанса 1/2 с Землей)

    NASA Astrophysics Data System (ADS)

    Bykova, L. E.; Razdymakhina, O. N.

    2011-07-01

    In this paper the results of investigations of near-Earth asteroids (NEA) dynamics in vicinity of the 1/2 resonance with Earth are presented. For each of these asteroids the evolution of probability motion domains is investigated during several thousand years. The investigations are conducted on cluster SKIF Cyberia with digit grid 128 bit. The performance of motion prediction of many real and virtual asteroids on multiple-processor computing system with long digit grid has been shown. The estimate of performance is carried out in comparison with solution on personal computer with digit grid 80 bit

  8. A high-rate PCI-based telemetry processor system

    NASA Astrophysics Data System (ADS)

    Turri, R.

    2002-07-01

    The high performances reached by the Satellite on-board telemetry generation and transmission, as consequently, will impose the design of ground facilities with higher processing capabilities at low cost to allow a good diffusion of these ground station. The equipment normally used are based on complex, proprietary bus and computing architectures that prevent the systems from exploiting the continuous and rapid increasing in computing power available on market. The PCI bus systems now allow processing of high-rate data streams in a standard PC-system. At the same time the Windows NT operating system supports multitasking and symmetric multiprocessing, giving the capability to process high data rate signals. In addition, high-speed networking, 64 bit PCI-bus technologies and the increase in processor power and software, allow creating a system based on COTS products (which in future may be easily and inexpensively upgraded). In the frame of EUCLID RTP 9.8 project, a specific work element was dedicated to develop the architecture of a system able to acquire telemetry data of up to 600 Mbps. Laben S.p.A - a Finmeccanica Company -, entrusted of this work, has designed a PCI-based telemetry system making possible the communication between a satellite down-link and a wide area network at the required rate.

  9. MoNET: media over net gateway processor for next-generation network

    NASA Astrophysics Data System (ADS)

    Elabd, Hammam; Sundar, Rangarajan; Dedes, John

    2001-12-01

    MoNETTM (Media over Net) SX000 product family is designed using a scalable voice, video and packet-processing platform to address applications with channel densities from few voice channels to four OC3 per card. This platform is developed for bridging public circuit-switched network to the next generation packet telephony and data network. The platform consists of a DSP farm, RISC processors and interface modules. DSP farm is required to execute voice compression, image compression and line echo cancellation algorithms for large number of voice, video, fax, and modem or data channels. RISC CPUs are used for performing various packetizations based on RTP, UDP/IP and ATM encapsulations. In addition, RISC CPUs also participate in the DSP farm load management and communication with the host and other MoP devices. The MoNETTM S1000 communications device is designed for voice processing and for bridging TDM to ATM and IP packet networks. The S1000 consists of the DSP farm based on Carmel DSP core and 32-bit RISC CPU, along with Ethernet, Utopia, PCI, and TDM interfaces. In this paper, we will describe the VoIP infrastructure, building blocks of the S500, S1000 and S3000 devices, algorithms executed on these device and associated channel densities, detailed DSP architecture, memory architecture, data flow and scheduling.

  10. Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

    NASA Astrophysics Data System (ADS)

    Amooie, M. A.; Moortgat, J.

    2017-12-01

    We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.

  11. Optical signal processing for a smart vehicle lighting system using a-SiCH technology

    NASA Astrophysics Data System (ADS)

    Vieira, M. A.; Vieira, M.; Vieira, P.; Louro, P.

    2017-05-01

    We propose the use of Visible Light Communication (VLC) for vehicle safety applications, creating a smart vehicle lighting system that combines the functions of illumination and signaling, communications, and positioning. The feasibility of VLC is demonstrated by employing trichromatic Red-Green-Blue (RGB) LEDs as transmitters, since they offer the possibility of Wavelength Division Multiplexing (WDM), which can greatly increase the transmission data rate, when using SiC double p-i-n receivers to encode/decode the information. Trichromatic RGB Light Emitting Diodes (LED)s (RGB-LED) are used together for illumination proposes (headlamps) and individually, each chip, to transmit the driving range distance and data information. An on-off code is used to transmit the data. Free space is the transmission medium. The receivers consist of two stacked amorphous a-H:SiC cells. They combine the simultaneous demultiplexing operation with the photodetection and self-amplification. The proposed coding is based on SiC technology. Multiple Input Multi Output (MIMO) architecture is used. For data transmission, we propose the use of two headlights based on commercially available modulated white RGB-LEDs. For data receiving and decoding we use three a-SiC:H double pin/pin optical processors symmetrically distributed at the vehicle tail Moreover, we present a way to achieve vehicular communication using the parity bits. A representation with a 4 bit original string color message and the transmitted 7 bit string, the encoding and decoding accurate positional information processes and the design of SiC navigation system are discussed and tested. A visible multilateration method estimates the drive distance range by using the decoded information received from several non-collinear transmitters.

  12. Research on NC motion controller based on SOPC technology

    NASA Astrophysics Data System (ADS)

    Jiang, Tingbiao; Meng, Biao

    2006-11-01

    With the rapid development of the digitization and informationization, the application of numerical control technology in the manufacturing industry becomes more and more important. However, the conventional numerical control system usually has some shortcomings such as the poor in system openness, character of real-time, cutability and reconfiguration. In order to solve these problems, this paper investigates the development prospect and advantage of the application in numerical control area with system-on-a-Programmable-Chip (SOPC) technology, and puts forward to a research program approach to the NC controller based on SOPC technology. Utilizing the characteristic of SOPC technology, we integrate high density logic device FPGA, memory SRAM, and embedded processor ARM into a single programmable logic device. We also combine the 32-bit RISC processor with high computing capability of the complicated algorithm with the FPGA device with strong motivable reconfiguration logic control ability. With these steps, we can greatly resolve the defect described in above existing numerical control systems. For the concrete implementation method, we use FPGA chip embedded with ARM hard nuclear processor to construct the control core of the motion controller. We also design the peripheral circuit of the controller according to the requirements of actual control functions, transplant real-time operating system into ARM, design the driver of the peripheral assisted chip, develop the application program to control and configuration of FPGA, design IP core of logic algorithm for various NC motion control to configured it into FPGA. The whole control system uses the concept of modular and structured design to develop hardware and software system. Thus the NC motion controller with the advantage of easily tailoring, highly opening, reconfigurable, and expandable can be implemented.

  13. Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

    PubMed Central

    2010-01-01

    Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262

  14. Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

    NASA Astrophysics Data System (ADS)

    Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

    2016-12-01

    The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.

  15. Design and qualification of the SEU/TD Radiation Monitor chip

    NASA Technical Reports Server (NTRS)

    Buehler, Martin G.; Blaes, Brent R.; Soli, George A.; Zamani, Nasser; Hicks, Kenneth A.

    1992-01-01

    This report describes the design, fabrication, and testing of the Single-Event Upset/Total Dose (SEU/TD) Radiation Monitor chip. The Radiation Monitor is scheduled to fly on the Mid-Course Space Experiment Satellite (MSX). The Radiation Monitor chip consists of a custom-designed 4-bit SRAM for heavy ion detection and three MOSFET's for monitoring total dose. In addition the Radiation Monitor chip was tested along with three diagnostic chips: the processor monitor and the reliability and fault chips. These chips revealed the quality of the CMOS fabrication process. The SEU/TD Radiation Monitor chip had an initial functional yield of 94.6 percent. Forty-three (43) SEU SRAM's and 14 Total Dose MOSFET's passed the hermeticity and final electrical tests and were delivered to LL.

  16. Design and initial application of the extended aircraft interrogation and display system: Multiprocessing ground support equipment for digital flight systems

    NASA Technical Reports Server (NTRS)

    Glover, Richard D.

    1987-01-01

    A pipelined, multiprocessor, general-purpose ground support equipment for digital flight systems has been developed and placed in service at the NASA Ames Research Center's Dryden Flight Research Facility. The design is an outgrowth of the earlier aircraft interrogation and display system (AIDS) used in support of several research projects to provide engineering-units display of internal control system parameters during development and qualification testing activities. The new system, incorporating multiple 16-bit processors, is called extended AIDS (XAIDS) and is now supporting the X-29A forward-swept-wing aircraft project. This report describes the design and mechanization of XAIDS and shows the steps whereby a typical user may take advantage of its high throughput and flexible features.

  17. a Linux PC Cluster for Lattice QCD with Exact Chiral Symmetry

    NASA Astrophysics Data System (ADS)

    Chiu, Ting-Wai; Hsieh, Tung-Han; Huang, Chao-Hsi; Huang, Tsung-Ren

    A computational system for lattice QCD with overlap Dirac quarks is described. The platform is a home-made Linux PC cluster, built with off-the-shelf components. At present the system constitutes of 64 nodes, with each node consisting of one Pentium 4 processor (1.6/2.0/2.5 GHz), one Gbyte of PC800/1066 RDRAM, one 40/80/120 Gbyte hard disk, and a network card. The computationally intensive parts of our program are written in SSE2 codes. The speed of our system is estimated to be 70 Gflops, and its price/performance ratio is better than $1.0/Mflops for 64-bit (double precision) computations in quenched QCD. We discuss how to optimize its hardware and software for computing propagators of overlap Dirac quarks.

  18. Development of land based radar polarimeter processor system

    NASA Technical Reports Server (NTRS)

    Kronke, C. W.; Blanchard, A. J.

    1983-01-01

    The processing subsystem of a land based radar polarimeter was designed and constructed. This subsystem is labeled the remote data acquisition and distribution system (RDADS). The radar polarimeter, an experimental remote sensor, incorporates the RDADS to control all operations of the sensor. The RDADS uses industrial standard components including an 8-bit microprocessor based single board computer, analog input/output boards, a dynamic random access memory board, and power supplis. A high-speed digital electronics board was specially designed and constructed to control range-gating for the radar. A complete system of software programs was developed to operate the RDADS. The software uses a powerful real time, multi-tasking, executive package as an operating system. The hardware and software used in the RDADS are detailed. Future system improvements are recommended.

  19. Low-cost real-time infrared scene generation for image projection and signal injection

    NASA Astrophysics Data System (ADS)

    Buford, James A., Jr.; King, David E.; Bowden, Mark H.

    1998-07-01

    As cost becomes an increasingly important factor in the development and testing of Infrared sensors and flight computer/processors, the need for accurate hardware-in-the- loop (HWIL) simulations is critical. In the past, expensive and complex dedicated scene generation hardware was needed to attain the fidelity necessary for accurate testing. Recent technological advances and innovative applications of established technologies are beginning to allow development of cost-effective replacements for dedicated scene generators. These new scene generators are mainly constructed from commercial-off-the-shelf (COTS) hardware and software components. At the U.S. Army Aviation and Missile Command (AMCOM) Missile Research, Development, and Engineering Center (MRDEC), researchers have developed such a dynamic IR scene generator (IRSG) built around COTS hardware and software. The IRSG is used to provide dynamic inputs to an IR scene projector for in-band seeker testing and for direct signal injection into the seeker or processor electronics. AMCOM MRDEC has developed a second generation IRSG, namely IRSG2, using the latest Silicon Graphics Incorporated (SGI) Onyx2 with Infinite Reality graphics. As reported in previous papers, the SGI Onyx Reality Engine 2 is the platform of the original IRSG that is now referred to as IRSG1. IRSG1 has been in operation and used daily for the past three years on several IR projection and signal injection HWIL programs. Using this second generation IRSG, frame rates have increased from 120 Hz to 400 Hz and intensity resolution from 12 bits to 16 bits. The key features of the IRSGs are real time missile frame rates and frame sizes, dynamic missile-to-target(s) viewpoint updated each frame in real-time by a six-degree-of- freedom (6DOF) system under test (SUT) simulation, multiple dynamic objects (e.g. targets, terrain/background, countermeasures, and atmospheric effects), latency compensation, point-to-extended source anti-aliased targets, and sensor modeling effects. This paper provides a comparison between the IRSG1 and IRSG2 systems and focuses on the IRSG software, real time features, and database development tools.

  20. Implementation of a level 1 trigger system using high speed serial (VXS) techniques for the 12GeV high luminosity experimental programs at Thomas Jefferson National Accelerator Facility

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    C. Cuevas, B. Raydo, H. Dong, A. Gupta, F.J. Barbosa, J. Wilson, W.M. Taylor, E. Jastrzembski, D. Abbott

    We will demonstrate a hardware and firmware solution for a complete fully pipelined multi-crate trigger system that takes advantage of the elegant high speed VXS serial extensions for VME. This trigger system includes three sections starting with the front end crate trigger processor (CTP), a global Sub-System Processor (SSP) and a Trigger Supervisor that manages the timing, synchronization and front end event readout. Within a front end crate, trigger information is gathered from each 16 Channel, 12 bit Flash ADC module at 4 nS intervals via the VXS backplane, to a Crate Trigger Processor (CTP). Each Crate Trigger Processor receivesmore » these 500 MB/S VXS links from the 16 FADC-250 modules, aligns skewed data inherent of Aurora protocol, and performs real time crate level trigger algorithms. The algorithm results are encoded using a Reed-Solomon technique and transmission of this Level 1 trigger data is sent to the SSP using a multi-fiber link. The multi-fiber link achieves an aggregate trigger data transfer rate to the global trigger at 8 Gb/s. The SSP receives and decodes Reed-Solomon error correcting transmission from each crate, aligns the data, and performs the global level trigger algorithms. The entire trigger system is synchronous and operates at 250 MHz with the Trigger Supervisor managing not only the front end event readout, but also the distribution of the critical timing clocks, synchronization signals, and the global trigger signals to each front end readout crate. These signals are distributed to the front end crates on a separate fiber link and each crate is synchronized using a unique encoding scheme to guarantee that each front end crate is synchronous with a fixed latency, independent of the distance between each crate. The overall trigger signal latency is <3 uS, and the proposed 12GeV experiments at Jefferson Lab require up to 200KHz Level 1 trigger rate.« less

  1. Selective visual region of interest to enhance medical video conferencing

    NASA Astrophysics Data System (ADS)

    Bonneau, Walt, Jr.; Read, Christopher J.; Shirali, Girish

    1998-06-01

    The continued economic pressure that is being placed upon the healthcare industry creates both challenge and opportunity to develop cost effective healthcare tools. Tools that provide improvements in the quality of medical care at the same time improve the distribution of efficient care will create product demand. Video Conferencing systems are one of the latest product technologies that are evolving their way into healthcare applications. The systems that provide quality Bi- directional video and imaging at the lowest system and communication cost are creating many possible options for the healthcare industry. A method to use only 128k bits/sec. of ISDN bandwidth while providing quality video images in selected regions will be applied to echocardiograms using a low cost video conferencing system operating within a basic rate ISDN line bandwidth. Within a given display area (frame) it has been observed that only selected informational areas of the frame of are of value when viewing for detail and precision within an image. Much in the same manner that a photograph is cropped. If a method to accomplish Region Of Interest (ROI) was applied to video conferencing using H.320 with H.263 (compression) and H.281 (camera control) international standards, medical image quality could be achieved in a cost-effective manner. For example, the cardiologist could be provided with a selectable three to eight end-point viewable ROI polygon that defines the ROI in the image. This is achieved by the video system calculating the selected regional end-points and creating an alpha mask to signify the importance of the ROI to the compression processor. This region is then applied to the compression algorithm in a manner that the majority of the video conferencing processor cycles are focused on the ROI of the image. An occasional update of the non-ROI area is processed to maintain total image coherence. The user could control the non-ROI area updates. Providing encoder side ROI specification is of value. However, the power of this capability is improved if remote access and selection of the ROI is also provided. Using the H.281 camera standard and proposing an additional option to the standard to allow for remote ROI selection would make this possible. When ROI is applied the ability to reach the equivalent of 384K bits/sec ISDN rates may be achieved or exceeded depending upon the size of the selected ROI using 128K bits/sec. This opens additional opportunity to establish international calling and reduced call rates by up to sixty- six percent making reoccurring communication costs attractive. Rates of twenty to thirty quality ROI updates could be achieved. It is however important to understand that this technique is still under development.

  2. Motion and Emotional Behavior Design for Pet Robot Dog

    NASA Astrophysics Data System (ADS)

    Cheng, Chi-Tai; Yang, Yu-Ting; Miao, Shih-Heng; Wong, Ching-Chang

    A pet robot dog with two ears, one mouth, one facial expression plane, and one vision system is designed and implemented so that it can do some emotional behaviors. Three processors (Inter® Pentium® M 1.0 GHz, an 8-bit processer 8051, and embedded soft-core processer NIOS) are used to control the robot. One camera, one power detector, four touch sensors, and one temperature detector are used to obtain the information of the environment. The designed robot with 20 DOF (degrees of freedom) is able to accomplish the walking motion. A behavior system is built on the implemented pet robot so that it is able to choose a suitable behavior for different environmental situation. From the practical test, we can see that the implemented pet robot dog can do some emotional interaction with the human.

  3. Infrared readout electronics; Proceedings of the Meeting, Orlando, FL, Apr. 21, 22, 1992

    NASA Technical Reports Server (NTRS)

    Fossum, Eric R. (Editor)

    1992-01-01

    The present volume on IR readout electronics discusses cryogenic readout using silicon devices, cryogenic readout using III-V and LTS devices, multiplexers for higher temperatures, and focal-plane signal processing electronics. Attention is given to the optimization of cryogenic CMOS processes for sub-10-K applications, cryogenic measurements of aerojet GaAs n-JFETs, inP-based heterostructure device technology for ultracold readout applications, and a three-terminal semiconductor-superconductor transimpedance amplifier. Topics addressed include unfulfilled needs in IR astronomy focal-plane readout electronics, IR readout integrated circuit technology for tactical missile systems, and radiation-hardened 10-bit A/D for FPA signal processing. Also discussed are the implementation of a noise reduction circuit for spaceflight IR spectrometers, a real-time processor for staring receivers, and a fiber-optic link design for INMOS transputers.

  4. A system-level view of optimizing high-channel-count wireless biosignal telemetry.

    PubMed

    Chandler, Rodney J; Gibson, Sarah; Karkare, Vaibhav; Farshchi, Shahin; Marković, Dejan; Judy, Jack W

    2009-01-01

    In this paper we perform a system-level analysis of a wireless biosignal telemetry system. We perform an analysis of each major system component (e.g., analog front end, analog-to-digital converter, digital signal processor, and wireless link), in which we consider physical, algorithmic, and design limitations. Since there are a wide range applications for wireless biosignal telemetry systems, each with their own unique set of requirements for key parameters (e.g., channel count, power dissipation, noise level, number of bits, etc.), our analysis is equally broad. The net result is a set of plots, in which the power dissipation for each component and as the system as a whole, are plotted as a function of the number of channels for different architectural strategies. These results are also compared to existing implementations of complete wireless biosignal telemetry systems.

  5. SAMPA Chip: the New 32 Channels ASIC for the ALICE TPC and MCH Upgrades

    NASA Astrophysics Data System (ADS)

    Adolfsson, J.; Ayala Pabon, A.; Bregant, M.; Britton, C.; Brulin, G.; Carvalho, D.; Chambert, V.; Chinellato, D.; Espagnon, B.; Hernandez Herrera, H. D.; Ljubicic, T.; Mahmood, S. M.; Mjörnmark, U.; Moraes, D.; Munhoz, M. G.; Noël, G.; Oskarsson, A.; Osterman, L.; Pilyar, A.; Read, K.; Ruette, A.; Russo, P.; Sanches, B. C. S.; Severo, L.; Silvermyr, D.; Suire, C.; Tambave, G. J.; Tun-Lanoë, K. M. M.; van Noije, W.; Velure, A.; Vereschagin, S.; Wanlin, E.; Weber, T. O.; Zaporozhets, S.

    2017-04-01

    This paper presents the test results of the second prototype of SAMPA, the ASIC designed for the upgrade of read-out front end electronics of the ALICE Time Projection Chamber (TPC) and Muon Chamber (MCH). SAMPA is made in a 130 nm CMOS technology with 1.25 V nominal voltage supply and provides 32 channels, with selectable input polarity, and three possible combinations of shaping time and sensitivity. Each channel consists of a Charge Sensitive Amplifier, a semi-Gaussian shaper and a 10-bit ADC; a Digital Signal Processor provides digital filtering and compression capability. In the second prototype run both full chip and single test blocks were fabricated, allowing block characterization and full system behaviour studies. Experimental results are here presented showing agreement with requirements for both the blocks and the full chip.

  6. Frame Decoder for Consultative Committee for Space Data Systems (CCSDS)

    NASA Technical Reports Server (NTRS)

    Reyes, Miguel A. De Jesus

    2014-01-01

    GNU Radio is a free and open source development toolkit that provides signal processing to implement software radios. It can be used with low-cost external RF hardware to create software defined radios, or without hardware in a simulation-like environment. GNU Radio applications are primarily written in Python and C++. The Universal Software Radio Peripheral (USRP) is a computer-hosted software radio designed by Ettus Research. The USRP connects to a host computer via high-speed Gigabit Ethernet. Using the open source Universal Hardware Driver (UHD), we can run GNU Radio applications using the USRP. An SDR is a "radio in which some or all physical layer functions are software defined"(IEEE Definition). A radio is any kind of device that wirelessly transmits or receives radio frequency (RF) signals in the radio frequency. An SDR is a radio communication system where components that have been typically implemented in hardware are implemented in software. GNU Radio has a generic packet decoder block that is not optimized for CCSDS frames. Using this generic packet decoder will add bytes to the CCSDS frames and will not permit for bit error correction using Reed-Solomon. The CCSDS frames consist of 256 bytes, including a 32-bit sync marker (0x1ACFFC1D). This frames are generated by the Space Data Processor and GNU Radio will perform the modulation and framing operations, including frame synchronization.

  7. Bayer image parallel decoding based on GPU

    NASA Astrophysics Data System (ADS)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  8. Mongoose: Creation of a Rad-Hard MIPS R3000

    NASA Technical Reports Server (NTRS)

    Lincoln, Dan; Smith, Brian

    1993-01-01

    This paper describes the development of a 32 Bit, full MIPS R3000 code-compatible Rad-Hard CPU, code named Mongoose. Mongoose progressed from contract award, through the design cycle, to operational silicon in 12 months to meet a space mission for NASA. The goal was the creation of a fully static device capable of operation to the maximum Mil-883 derated speed, worst-case post-rad exposure with full operational integrity. This included consideration of features for functional enhancements relating to mission compatibility and removal of commercial practices not supported by Rad-Hard technology. 'Mongoose' developed from an evolution of LSI Logic's MIPS-I embedded processor, LR33000, code named Cobra, to its Rad-Hard 'equivalent', Mongoose. The term 'equivalent' is used to infer that the core of the processor is functionally identical, allowing the same use and optimizations of the MIPS-I Instruction Set software tool suite for compilation, software program trace, etc. This activity was started in September of 1991 under a contract from NASA-Goddard Space Flight Center (GSFC)-Flight Data Systems. The approach affected a teaming of NASA-GSFC for program development, LSI Logic for system and ASIC design coupled with the Rad-Hard process technology, and Harris (GASD) for Rad-Hard microprocessor design expertise. The program culminated with the generation of Rad-Hard Mongoose prototypes one year later.

  9. Mobile high-performance computing (HPC) for synthetic aperture radar signal processing

    NASA Astrophysics Data System (ADS)

    Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen

    2018-04-01

    The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.

  10. Strong Motion Seismograph Based On MEMS Accelerometer

    NASA Astrophysics Data System (ADS)

    Teng, Y.; Hu, X.

    2013-12-01

    The MEMS strong motion seismograph we developed used the modularization method to design its software and hardware.It can fit various needs in different application situation.The hardware of the instrument is composed of a MEMS accelerometer,a control processor system,a data-storage system,a wired real-time data transmission system by IP network,a wireless data transmission module by 3G broadband,a GPS calibration module and power supply system with a large-volumn lithium battery in it. Among it,the seismograph's sensor adopted a three-axis with 14-bit high resolution and digital output MEMS accelerometer.Its noise level just reach about 99μg/√Hz and ×2g to ×8g dynamically selectable full-scale.Its output data rates from 1.56Hz to 800Hz. Its maximum current consumption is merely 165μA,and the device is so small that it is available in a 3mm×3mm×1mm QFN package. Furthermore,there is access to both low pass filtered data as well as high pass filtered data,which minimizes the data analysis required for earthquake signal detection. So,the data post-processing can be simplified. Controlling process system adopts a 32-bit low power consumption embedded ARM9 processor-S3C2440 and is based on the Linux operation system.The processor's operating clock at 400MHz.The controlling system's main memory is a 64MB SDRAM with a 256MB flash-memory.Besides,an external high-capacity SD card data memory can be easily added.So the system can meet the requirements for data acquisition,data processing,data transmission,data storage,and so on. Both wired and wireless network can satisfy remote real-time monitoring, data transmission,system maintenance,status monitoring or updating software.Linux was embedded and multi-layer designed conception was used.The code, including sensor hardware driver,the data acquisition,earthquake setting out and so on,was written on medium layer.The hardware driver consist of IIC-Bus interface driver, IO driver and asynchronous notification driver. The application program layer mainly concludes: earthquake parameter module, local database managing module, data transmission module, remote monitoring, FTP service and so on. The application layer adopted multi-thread process. The whole strong motion seismograph was encapsulated in a small aluminum box, which size is 80mm×120mm×55mm. The inner battery can work continuesly more than 24 hours. The MEMS accelerograph uses modular design for its software part and hardware part. It has remote software update function and can meet the following needs: a) Auto picking up the earthquake event; saving the data on wave-event files and hours files; It may be used for monitoring strong earthquake, explosion, bridge and house health. b) Auto calculate the earthquake parameters, and transferring those parameters by 3G wireless broadband network. This kind of seismograph has characteristics of low cost, easy installation. They can be concentrated in the urban region or areas need to specially care. We can set up a ground motion parameters quick report sensor network while large earthquake break out. Then high-resolution-fine shake-map can be easily produced for the need of emergency rescue. c) By loading P-wave detection program modules, it can be used for earthquake early warning for large earthquakes; d) Can easily construct a high-density layout seismic monitoring network owning remote control and modern intelligent earthquake sensor.

  11. The entropic cost of quantum generalized measurements

    NASA Astrophysics Data System (ADS)

    Mancino, Luca; Sbroscia, Marco; Roccia, Emanuele; Gianani, Ilaria; Somma, Fabrizia; Mataloni, Paolo; Paternostro, Mauro; Barbieri, Marco

    2018-03-01

    Landauer's principle introduces a symmetry between computational and physical processes: erasure of information, a logically irreversible operation, must be underlain by an irreversible transformation dissipating energy. Monitoring micro- and nano-systems needs to enter into the energetic balance of their control; hence, finding the ultimate limits is instrumental to the development of future thermal machines operating at the quantum level. We report on the experimental investigation of a lower bound to the irreversible entropy associated to generalized quantum measurements on a quantum bit. We adopted a quantum photonics gate to implement a device interpolating from the weakly disturbing to the fully invasive and maximally informative regime. Our experiment prompted us to introduce a bound taking into account both the classical result of the measurement and the outcoming quantum state; unlike previous investigation, our entropic bound is based uniquely on measurable quantities. Our results highlight what insights the information-theoretic approach provides on building blocks of quantum information processors.

  12. A CMOS silicon spin qubit

    PubMed Central

    Maurand, R.; Jehl, X.; Kotekar-Patil, D.; Corna, A.; Bohuslavskyi, H.; Laviéville, R.; Hutin, L.; Barraud, S.; Vinet, M.; Sanquer, M.; De Franceschi, S.

    2016-01-01

    Silicon, the main constituent of microprocessor chips, is emerging as a promising material for the realization of future quantum processors. Leveraging its well-established complementary metal–oxide–semiconductor (CMOS) technology would be a clear asset to the development of scalable quantum computing architectures and to their co-integration with classical control hardware. Here we report a silicon quantum bit (qubit) device made with an industry-standard fabrication process. The device consists of a two-gate, p-type transistor with an undoped channel. At low temperature, the first gate defines a quantum dot encoding a hole spin qubit, the second one a quantum dot used for the qubit read-out. All electrical, two-axis control of the spin qubit is achieved by applying a phase-tunable microwave modulation to the first gate. The demonstrated qubit functionality in a basic transistor-like device constitutes a promising step towards the elaboration of scalable spin qubit geometries in a readily exploitable CMOS platform. PMID:27882926

  13. SAR processing on the MPP

    NASA Technical Reports Server (NTRS)

    Batcher, K. E.; Eddey, E. E.; Faiss, R. O.; Gilmore, P. A.

    1981-01-01

    The processing of synthetic aperture radar (SAR) signals using the massively parallel processor (MPP) is discussed. The fast Fourier transform convolution procedures employed in the algorithms are described. The MPP architecture comprises an array unit (ARU) which processes arrays of data; an array control unit which controls the operation of the ARU and performs scalar arithmetic; a program and data management unit which controls the flow of data; and a unique staging memory (SM) which buffers and permutes data. The ARU contains a 128 by 128 array of bit-serial processing elements (PE). Two-by-four surarrays of PE's are packaged in a custom VLSI HCMOS chip. The staging memory is a large multidimensional-access memory which buffers and permutes data flowing with the system. Efficient SAR processing is achieved via ARU communication paths and SM data manipulation. Real time processing capability can be realized via a multiple ARU, multiple SM configuration.

  14. Investigation of an advanced fault tolerant integrated avionics system

    NASA Technical Reports Server (NTRS)

    Dunn, W. R.; Cottrell, D.; Flanders, J.; Javornik, A.; Rusovick, M.

    1986-01-01

    Presented is an advanced, fault-tolerant multiprocessor avionics architecture as could be employed in an advanced rotorcraft such as LHX. The processor structure is designed to interface with existing digital avionics systems and concepts including the Army Digital Avionics System (ADAS) cockpit/display system, navaid and communications suites, integrated sensing suite, and the Advanced Digital Optical Control System (ADOCS). The report defines mission, maintenance and safety-of-flight reliability goals as might be expected for an operational LHX aircraft. Based on use of a modular, compact (16-bit) microprocessor card family, results of a preliminary study examining simplex, dual and standby-sparing architectures is presented. Given the stated constraints, it is shown that the dual architecture is best suited to meet reliability goals with minimum hardware and software overhead. The report presents hardware and software design considerations for realizing the architecture including redundancy management requirements and techniques as well as verification and validation needs and methods.

  15. Fast Whole-Engine Stirling Analysis

    NASA Technical Reports Server (NTRS)

    Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako

    2005-01-01

    An experimentally validated approach is described for fast axisymmetric Stirling engine simulations. These simulations include the entire displacer interior and demonstrate it is possible to model a complete engine cycle in less than an hour. The focus of this effort was to demonstrate it is possible to produce useful Stirling engine performance results in a time-frame short enough to impact design decisions. The combination of utilizing the latest 64-bit Opteron computer processors, fiber-optical Myrinet communications, dynamic meshing, and across zone partitioning has enabled solution times at least 240 times faster than previous attempts at simulating the axisymmetric Stirling engine. A comparison of the multidimensional results, calibrated one-dimensional results, and known experimental results is shown. This preliminary comparison demonstrates that axisymmetric simulations can be very accurate, but more work remains to improve the simulations through such means as modifying the thermal equilibrium regenerator models, adding fluid-structure interactions, including radiation effects, and incorporating mechanodynamics.

  16. The Level 0 Pixel Trigger system for the ALICE experiment

    NASA Astrophysics Data System (ADS)

    Aglieri Rinella, G.; Kluge, A.; Krivda, M.; ALICE Silicon Pixel Detector project

    2007-01-01

    The ALICE Silicon Pixel Detector contains 1200 readout chips. Fast-OR signals indicate the presence of at least one hit in the 8192 pixel matrix of each chip. The 1200 bits are transmitted every 100 ns on 120 data readout optical links using the G-Link protocol. The Pixel Trigger System extracts and processes them to deliver an input signal to the Level 0 trigger processor targeting a latency of 800 ns. The system is compact, modular and based on FPGA devices. The architecture allows the user to define and implement various trigger algorithms. The system uses advanced 12-channel parallel optical fiber modules operating at 1310 nm as optical receivers and 12 deserializer chips closely packed in small area receiver boards. Alternative solutions with multi-channel G-Link deserializers implemented directly in programmable hardware devices were investigated. The design of the system and the progress of the ALICE Pixel Trigger project are described in this paper.

  17. A CMOS silicon spin qubit

    NASA Astrophysics Data System (ADS)

    Maurand, R.; Jehl, X.; Kotekar-Patil, D.; Corna, A.; Bohuslavskyi, H.; Laviéville, R.; Hutin, L.; Barraud, S.; Vinet, M.; Sanquer, M.; de Franceschi, S.

    2016-11-01

    Silicon, the main constituent of microprocessor chips, is emerging as a promising material for the realization of future quantum processors. Leveraging its well-established complementary metal-oxide-semiconductor (CMOS) technology would be a clear asset to the development of scalable quantum computing architectures and to their co-integration with classical control hardware. Here we report a silicon quantum bit (qubit) device made with an industry-standard fabrication process. The device consists of a two-gate, p-type transistor with an undoped channel. At low temperature, the first gate defines a quantum dot encoding a hole spin qubit, the second one a quantum dot used for the qubit read-out. All electrical, two-axis control of the spin qubit is achieved by applying a phase-tunable microwave modulation to the first gate. The demonstrated qubit functionality in a basic transistor-like device constitutes a promising step towards the elaboration of scalable spin qubit geometries in a readily exploitable CMOS platform.

  18. Direct digital conversion detector technology

    NASA Astrophysics Data System (ADS)

    Mandl, William J.; Fedors, Richard

    1995-06-01

    Future imaging sensors for the aerospace and commercial video markets will depend on low cost, high speed analog-to-digital (A/D) conversion to efficiently process optical detector signals. Current A/D methods place a heavy burden on system resources, increase noise, and limit the throughput. This paper describes a unique method for incorporating A/D conversion right on the focal plane array. This concept is based on Sigma-Delta sampling, and makes optimum use of the active detector real estate. Combined with modern digital signal processors, such devices will significantly increase data rates off the focal plane. Early conversion to digital format will also decrease the signal susceptibility to noise, lowering the communications bit error rate. Computer modeling of this concept is described, along with results from several simulation runs. A potential application for direct digital conversion is also reviewed. Future uses for this technology could range from scientific instruments to remote sensors, telecommunications gear, medical diagnostic tools, and consumer products.

  19. A system for routing arbitrary directed graphs on SIMD architectures

    NASA Technical Reports Server (NTRS)

    Tomboulian, Sherryl

    1987-01-01

    There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal.

  20. Fabrication of Circuit QED Quantum Processors, Part 2: Advanced Semiconductor Manufacturing Perspectives

    NASA Astrophysics Data System (ADS)

    Michalak, D. J.; Bruno, A.; Caudillo, R.; Elsherbini, A. A.; Falcon, J. A.; Nam, Y. S.; Poletto, S.; Roberts, J.; Thomas, N. K.; Yoscovits, Z. R.; Dicarlo, L.; Clarke, J. S.

    Experimental quantum computing is rapidly approaching the integration of sufficient numbers of quantum bits for interesting applications, but many challenges still remain. These challenges include: realization of an extensible design for large array scale up, sufficient material process control, and discovery of integration schemes compatible with industrial 300 mm fabrication. We present recent developments in extensible circuits with vertical delivery. Toward the goal of developing a high-volume manufacturing process, we will present recent results on a new Josephson junction process that is compatible with current tooling. We will then present the improvements in NbTiN material uniformity that typical 300 mm fabrication tooling can provide. While initial results on few-qubit systems are encouraging, advanced processing control is expected to deliver the improvements in qubit uniformity, coherence time, and control required for larger systems. Research funded by Intel Corporation.

  1. A CMOS silicon spin qubit.

    PubMed

    Maurand, R; Jehl, X; Kotekar-Patil, D; Corna, A; Bohuslavskyi, H; Laviéville, R; Hutin, L; Barraud, S; Vinet, M; Sanquer, M; De Franceschi, S

    2016-11-24

    Silicon, the main constituent of microprocessor chips, is emerging as a promising material for the realization of future quantum processors. Leveraging its well-established complementary metal-oxide-semiconductor (CMOS) technology would be a clear asset to the development of scalable quantum computing architectures and to their co-integration with classical control hardware. Here we report a silicon quantum bit (qubit) device made with an industry-standard fabrication process. The device consists of a two-gate, p-type transistor with an undoped channel. At low temperature, the first gate defines a quantum dot encoding a hole spin qubit, the second one a quantum dot used for the qubit read-out. All electrical, two-axis control of the spin qubit is achieved by applying a phase-tunable microwave modulation to the first gate. The demonstrated qubit functionality in a basic transistor-like device constitutes a promising step towards the elaboration of scalable spin qubit geometries in a readily exploitable CMOS platform.

  2. Fast Whole-Engine Stirling Analysis

    NASA Technical Reports Server (NTRS)

    Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako

    2007-01-01

    An experimentally validated approach is described for fast axisymmetric Stirling engine simulations. These simulations include the entire displacer interior and demonstrate it is possible to model a complete engine cycle in less than an hour. The focus of this effort was to demonstrate it is possible to produce useful Stirling engine performance results in a time-frame short enough to impact design decisions. The combination of utilizing the latest 64-bit Opteron computer processors, fiber-optical Myrinet communications, dynamic meshing, and across zone partitioning has enabled solution times at least 240 times faster than previous attempts at simulating the axisymmetric Stirling engine. A comparison of the multidimensional results, calibrated one-dimensional results, and known experimental results is shown. This preliminary comparison demonstrates that axisymmetric simulations can be very accurate, but more work remains to improve the simulations through such means as modifying the thermal equilibrium regenerator models, adding fluid-structure interactions, including radiation effects, and incorporating mechanodynamics.

  3. Millisecond timing on PCs and Macs.

    PubMed

    MacInnes, W J; Taylor, T L

    2001-05-01

    A real-time, object-oriented solution for displaying stimuli on Windows 95/98, MacOS and Linux platforms is presented. The program, written in C++, utilizes a special-purpose window class (GLWindow), OpenGL, and 32-bit graphics acceleration; it avoids display timing uncertainty by substituting the new window class for the default window code for each system. We report the outcome of tests for real-time capability across PC and Mac platforms running a variety of operating systems. The test program, which can be used as a shell for programming real-time experiments and testing specific processors, is available at http://www.cs.dal.ca/~macinnwj. We propose to provide researchers with a sense of the usefulness of our program, highlight the ability of many multitasking environments to achieve real time, as well as caution users about systems that may not achieve real time, even under optimal conditions.

  4. Real-time image reconstruction and display system for MRI using a high-speed personal computer.

    PubMed

    Haishi, T; Kose, K

    1998-09-01

    A real-time NMR image reconstruction and display system was developed using a high-speed personal computer and optimized for the 32-bit multitasking Microsoft Windows 95 operating system. The system was operated at various CPU clock frequencies by changing the motherboard clock frequency and the processor/bus frequency ratio. When the Pentium CPU was used at the 200 MHz clock frequency, the reconstruction time for one 128 x 128 pixel image was 48 ms and that for the image display on the enlarged 256 x 256 pixel window was about 8 ms. NMR imaging experiments were performed with three fast imaging sequences (FLASH, multishot EPI, and one-shot EPI) to demonstrate the ability of the real-time system. It was concluded that in most cases, high-speed PC would be the best choice for the image reconstruction and display system for real-time MRI. Copyright 1998 Academic Press.

  5. ALMA Correlator Real-Time Data Processor

    NASA Astrophysics Data System (ADS)

    Pisano, J.; Amestica, R.; Perez, J.

    2005-10-01

    The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.

  6. Variable-focus microscopy and UV surface dissolution imaging as complementary techniques in intrinsic dissolution rate determination.

    PubMed

    Ward, Adam; Walton, Karl; Box, Karl; Østergaard, Jesper; Gillie, Lisa J; Conway, Barbara R; Asare-Addo, Kofi

    2017-09-15

    This work reports a novel approach to the assessment of the surface properties of compacts used in Surface Dissolution Imaging (SDI). SDI is useful for determining intrinsic dissolution rate (IDR), an important parameter in early stage drug development. Surface topography, post-compaction and post-SDI run, have been measured using a non-contact, optical, three-dimensional microscope based on focus variation, the Alicona Infinite Focus Microscope, with the aim of correlating the IDRs to the surface properties. Ibuprofen (IBU) was used as a model poorly-soluble drug. DSC and XRD were used to monitor possible polymorphic changes that may have occurred post-compaction and post-SDI run. IBUs IDR decreased from 0.033mg/min/cm 2 to 0.022mg/min/cm 2 from 10 to 20min, respectively, during the experiment. XRD and DSC showed no form changes during the SDI run. The surface topography images showed that a distinct imprint was embossed on the surfaces of some compacts which could affect IDRs. Surface parameter values were associated with the SDI experiments which showed strong correlations with the IDR values. The variable-focus microscope can be used as a complimentary tool in the determination of IDR values from the SDI. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  7. AN OPTIMIZED 64X64 POINT TWO-DIMENSIONAL FAST FOURIER TRANSFORM

    NASA Technical Reports Server (NTRS)

    Miko, J.

    1994-01-01

    Scientists at Goddard have developed an efficient and powerful program-- An Optimized 64x64 Point Two-Dimensional Fast Fourier Transform-- which combines the performance of real and complex valued one-dimensional Fast Fourier Transforms (FFT's) to execute a two-dimensional FFT and its power spectrum coefficients. These coefficients can be used in many applications, including spectrum analysis, convolution, digital filtering, image processing, and data compression. The program's efficiency results from its technique of expanding all arithmetic operations within one 64-point FFT; its high processing rate results from its operation on a high-speed digital signal processor. For non-real-time analysis, the program requires as input an ASCII data file of 64x64 (4096) real valued data points. As output, this analysis produces an ASCII data file of 64x64 power spectrum coefficients. To generate these coefficients, the program employs a row-column decomposition technique. First, it performs a radix-4 one-dimensional FFT on each row of input, producing complex valued results. Then, it performs a one-dimensional FFT on each column of these results to produce complex valued two-dimensional FFT results. Finally, the program sums the squares of the real and imaginary values to generate the power spectrum coefficients. The program requires a Banshee accelerator board with 128K bytes of memory from Atlanta Signal Processors (404/892-7265) installed on an IBM PC/AT compatible computer (DOS ver. 3.0 or higher) with at least one 16-bit expansion slot. For real-time operation, an ASPI daughter board is also needed. The real-time configuration reads 16-bit integer input data directly into the accelerator board, operating on 64x64 point frames of data. The program's memory management also allows accumulation of the coefficient results. The real-time processing rate to calculate and accumulate the 64x64 power spectrum output coefficients is less than 17.0 mSec. Documentation is included in the price of the program. Source code is written in C, 8086 Assembly, and Texas Instruments TMS320C30 Assembly Languages. This program is available on a 5.25 inch 360K MS-DOS format diskette. IBM and IBM PC are registered trademarks of International Business Machines. MS-DOS is a registered trademark of Microsoft Corporation.

  8. A multi-channel low-power system-on-chip for single-unit recording and narrowband wireless transmission of neural signal.

    PubMed

    Bonfanti, A; Ceravolo, M; Zambra, G; Gusmeroli, R; Spinelli, A S; Lacaita, A L; Angotzi, G N; Baranauskas, G; Fadiga, L

    2010-01-01

    This paper reports a multi-channel neural recording system-on-chip (SoC) with digital data compression and wireless telemetry. The circuit consists of a 16 amplifiers, an analog time division multiplexer, an 8-bit SAR AD converter, a digital signal processor (DSP) and a wireless narrowband 400-MHz binary FSK transmitter. Even though only 16 amplifiers are present in our current die version, the whole system is designed to work with 64 channels demonstrating the feasibility of a digital processing and narrowband wireless transmission of 64 neural recording channels. A digital data compression, based on the detection of action potentials and storage of correspondent waveforms, allows the use of a 1.25-Mbit/s binary FSK wireless transmission. This moderate bit-rate and a low frequency deviation, Manchester-coded modulation are crucial for exploiting a narrowband wireless link and an efficient embeddable antenna. The chip is realized in a 0.35- εm CMOS process with a power consumption of 105 εW per channel (269 εW per channel with an extended transmission range of 4 m) and an area of 3.1 × 2.7 mm(2). The transmitted signal is captured by a digital TV tuner and demodulated by a wideband phase-locked loop (PLL), and then sent to a PC via an FPGA module. The system has been tested for electrical specifications and its functionality verified in in-vivo neural recording experiments.

  9. Deploying a quantum annealing processor to detect tree cover in aerial imagery of California

    PubMed Central

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Mukhopadhyay, Supratik; Nemani, Ramakrishna R.

    2017-01-01

    Quantum annealing is an experimental and potentially breakthrough computational technology for handling hard optimization problems, including problems of computer vision. We present a case study in training a production-scale classifier of tree cover in remote sensing imagery, using early-generation quantum annealing hardware built by D-wave Systems, Inc. Beginning within a known boosting framework, we train decision stumps on texture features and vegetation indices extracted from four-band, one-meter-resolution aerial imagery from the state of California. We then impose a regulated quadratic training objective to select an optimal voting subset from among these stumps. The votes of the subset define the classifier. For optimization, the logical variables in the objective function map to quantum bits in the hardware device, while quadratic couplings encode as the strength of physical interactions between the quantum bits. Hardware design limits the number of couplings between these basic physical entities to five or six. To account for this limitation in mapping large problems to the hardware architecture, we propose a truncation and rescaling of the training objective through a trainable metaparameter. The boosting process on our basic 108- and 508-variable problems, thus constituted, returns classifiers that incorporate a diverse range of color- and texture-based metrics and discriminate tree cover with accuracies as high as 92% in validation and 90% on a test scene encompassing the open space preserves and dense suburban build of Mill Valley, CA. PMID:28241028

  10. QWIP technology for both military and civilian applications

    NASA Astrophysics Data System (ADS)

    Gunapala, Sarath D.; Kukkonen, Carl A.; Sirangelo, Mark N.; McQuiston, Barbara K.; Chehayeb, Riad; Kaufmann, M.

    2001-10-01

    Advanced thermal imaging infrared cameras have been a cost effective and reliable method to obtain the temperature of objects. Quantum Well Infrared Photodetector (QWIP) based thermal imaging systems have advanced the state-of-the-art and are the most sensitive commercially available thermal systems. QWIP Technologies LLC, under exclusive agreement with Caltech University, is currently manufacturing the QWIP-ChipTM, a 320 X 256 element, bound-to-quasibound QWIP FPA. The camera performance falls within the long-wave IR band, spectrally peaked at 8.5 μm. The camera is equipped with a 32-bit floating-point digital signal processor combined with multi- tasking software, delivering a digital acquisition resolution of 12-bits using nominal power consumption of less than 50 Watts. With a variety of video interface options, remote control capability via an RS-232 connection, and an integrated control driver circuit to support motorized zoom and focus- compatible lenses, this camera design has excellent application in both the military and commercial sector. In the area of remote sensing, high-performance QWIP systems can be used for high-resolution, target recognition as part of a new system of airborne platforms (including UAVs). Such systems also have direct application in law enforcement, surveillance, industrial monitoring and road hazard detection systems. This presentation will cover the current performance of the commercial QWIP cameras, conceptual platform systems and advanced image processing for use in both military remote sensing and civilian applications currently being developed in road hazard monitoring.

  11. A Spacecraft Housekeeping System-on-Chip in a Radiation Hardened Structured ASIC

    NASA Technical Reports Server (NTRS)

    Suarez, George; DuMonthier, Jeffrey J.; Sheikh, Salman S.; Powell, Wesley A.; King, Robyn L.

    2012-01-01

    Housekeeping systems are essential to health monitoring of spacecraft and instruments. Typically, sensors are distributed across various sub-systems and data is collected using components such as analog-to-digital converters, analog multiplexers and amplifiers. In most cases programmable devices are used to implement the data acquisition control and storage, and the interface to higher level systems. Such discrete implementations require additional size, weight, power and interconnect complexity versus an integrated circuit solution, as well as the qualification of multiple parts. Although commercial devices are readily available, they are not suitable for space applications due the radiation tolerance and qualification requirements. The Housekeeping System-o n-A-Chip (HKSOC) is a low power, radiation hardened integrated solution suitable for spacecraft and instrument control and data collection. A prototype has been designed and includes a wide variety of functions including a 16-channel analog front-end for driving and reading sensors, analog-to-digital and digital-to-analog converters, on-chip temperature sensor, power supply current sense circuits, general purpose comparators and amplifiers, a 32-bit processor, digital I/O, pulse-width modulation (PWM) generators, timers and I2C master and slave serial interfaces. In addition, the device can operate in a bypass mode where the processor is disabled and external logic is used to control the analog and mixed signal functions. The device is suitable for stand-alone or distributed systems where multiple chips can be deployed across different sub-systems as intelligent nodes with computing and processing capabilities.

  12. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daily, Jeffrey A.

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  13. GSFC Cutting Edge Avionics Technologies for Spacecraft

    NASA Technical Reports Server (NTRS)

    Luers, Philip J.; Culver, Harry L.; Plante, Jeannette

    1998-01-01

    With the launch of NASA's first fiber optic bus on SAMPEX in 1992, GSFC has ushered in an era of new technology development and insertion into flight programs. Predating such programs the Lewis and Clark missions and the New Millenium Program, GSFC has spearheaded the drive to use cutting edge technologies on spacecraft for three reasons: to enable next generation Space and Earth Science, to shorten spacecraft development schedules, and to reduce the cost of NASA missions. The technologies developed have addressed three focus areas: standard interface components, high performance processing, and high-density packaging techniques enabling lower cost systems. To realize the benefits of standard interface components GSFC has developed and utilized radiation hardened/tolerant devices such as PCI target ASICs, Parallel Fiber Optic Data Bus terminals, MIL-STD-1773 and AS1773 transceivers, and Essential Services Node. High performance processing has been the focus of the Mongoose I and Mongoose V rad-hard 32-bit processor programs as well as the SMEX-Lite Computation Hub. High-density packaging techniques have resulted in 3-D stack DRAM packages and Chip-On-Board processes. Lower cost systems have been demonstrated by judiciously using all of our technology developments to enable "plug and play" scalable architectures. The paper will present a survey of development and insertion experiences for the above technologies, as well as future plans to enable more "better, faster, cheaper" spacecraft. Details of ongoing GSFC programs such as Ultra-Low Power electronics, Rad-Hard FPGAs, PCI master ASICs, and Next Generation Mongoose processors.

  14. Superconducting quantum circuits at the surface code threshold for fault tolerance.

    PubMed

    Barends, R; Kelly, J; Megrant, A; Veitia, A; Sank, D; Jeffrey, E; White, T C; Mutus, J; Fowler, A G; Campbell, B; Chen, Y; Chen, Z; Chiaro, B; Dunsworth, A; Neill, C; O'Malley, P; Roushan, P; Vainsencher, A; Wenner, J; Korotkov, A N; Cleland, A N; Martinis, John M

    2014-04-24

    A quantum computer can solve hard problems, such as prime factoring, database searching and quantum simulation, at the cost of needing to protect fragile quantum states from error. Quantum error correction provides this protection by distributing a logical state among many physical quantum bits (qubits) by means of quantum entanglement. Superconductivity is a useful phenomenon in this regard, because it allows the construction of large quantum circuits and is compatible with microfabrication. For superconducting qubits, the surface code approach to quantum computing is a natural choice for error correction, because it uses only nearest-neighbour coupling and rapidly cycled entangling gates. The gate fidelity requirements are modest: the per-step fidelity threshold is only about 99 per cent. Here we demonstrate a universal set of logic gates in a superconducting multi-qubit processor, achieving an average single-qubit gate fidelity of 99.92 per cent and a two-qubit gate fidelity of up to 99.4 per cent. This places Josephson quantum computing at the fault-tolerance threshold for surface code error correction. Our quantum processor is a first step towards the surface code, using five qubits arranged in a linear array with nearest-neighbour coupling. As a further demonstration, we construct a five-qubit Greenberger-Horne-Zeilinger state using the complete circuit and full set of gates. The results demonstrate that Josephson quantum computing is a high-fidelity technology, with a clear path to scaling up to large-scale, fault-tolerant quantum circuits.

  15. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE PAGES

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  16. Obtaining identical results with double precision global accuracy on different numbers of processors in parallel particle Monte Carlo simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cleveland, Mathew A., E-mail: cleveland7@llnl.gov; Brunner, Thomas A.; Gentile, Nicholas A.

    2013-10-15

    We describe and compare different approaches for achieving numerical reproducibility in photon Monte Carlo simulations. Reproducibility is desirable for code verification, testing, and debugging. Parallelism creates a unique problem for achieving reproducibility in Monte Carlo simulations because it changes the order in which values are summed. This is a numerical problem because double precision arithmetic is not associative. Parallel Monte Carlo, both domain replicated and decomposed simulations, will run their particles in a different order during different runs of the same simulation because the non-reproducibility of communication between processors. In addition, runs of the same simulation using different domain decompositionsmore » will also result in particles being simulated in a different order. In [1], a way of eliminating non-associative accumulations using integer tallies was described. This approach successfully achieves reproducibility at the cost of lost accuracy by rounding double precision numbers to fewer significant digits. This integer approach, and other extended and reduced precision reproducibility techniques, are described and compared in this work. Increased precision alone is not enough to ensure reproducibility of photon Monte Carlo simulations. Non-arbitrary precision approaches require a varying degree of rounding to achieve reproducibility. For the problems investigated in this work double precision global accuracy was achievable by using 100 bits of precision or greater on all unordered sums which where subsequently rounded to double precision at the end of every time-step.« less

  17. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less

  18. Current Controller for Multi-level Front-end Converter and Its Digital Implementation Considerations on Three-level Flying Capacitor Topology

    NASA Astrophysics Data System (ADS)

    Tekwani, P. N.; Shah, M. T.

    2017-10-01

    This paper presents behaviour analysis and digital implementation of current error space phasor based hysteresis controller applied to three-phase three-level flying capacitor converter as front-end topology. The controller is self-adaptive in nature, and takes the converter from three-level to two-level mode of operation and vice versa, following various trajectories of sector change with the change in reference dc-link voltage demanded by the load. It keeps current error space phasor within the prescribed hexagonal boundary. During the contingencies, the proposed controller takes the converter in over modulation mode to meet the load demand, and once the need is satisfied, controller brings back the converter in normal operating range. Simulation results are presented to validate behaviour of controller to meet the said contingencies. Unity power factor is assured by proposed controller with low current harmonic distortion satisfying limits prescribed in IEEE 519-2014. Proposed controller is implemented using TMS320LF2407 16-bit fixed-point digital signal processor. Detailed analysis of numerical format to avoid overflow of sensed variables in processor, and per-unit model implementation in software are discussed and hardware results are presented at various stages of signal conditioning to validate the experimental setup. Control logic for the generation of reference currents is implemented in TMS320LF2407A using assembly language and experimental results are also presented for the same.

  19. Front-End Board with Cyclone V as a Test High-Resolution Platform for the Auger_Beyond_2015 Front End Electronics

    NASA Astrophysics Data System (ADS)

    Szadkowski, Zbigniew

    2015-06-01

    The surface detector (SD) array of the Pierre Auger Observatory needs an upgrade which allows space for more complex triggers with higher bandwidth and greater dynamic range. To this end this paper presents a front-end board (FEB) with the largest Cyclone V E FPGA 5CEFA9F31I7N. It supports eight channels sampled with max. 250 MSps@14-bit resolution. Considered sampling for the SD is 120 MSps; however, the FEB has been developed with external anti-aliasing filters to retain maximal flexibility. Six channels are targeted at the SD, two are reserved for other experiments like: Auger Engineering Radio Array and additional muon counters. The FEB is an intermediate design plugged into a unified board communicating with a micro-controller at 40 MHz; however, it provides 250 MSPs sampling with an 18-bit dynamic range, is equipped with a virtual NIOS processor and supports 256 MB of SDRAM as well as an implemented spectral trigger based on the discrete cosine transform for detection of very inclined “old” showers. The FEB can also support neural network development for detection of “young” showers, potentially generated by neutrinos. A single FEB was already tested in the Auger surface detector in Malargüe (Argentina) for 120 and 160 MSps. Preliminary tests showed perfect stability of data acquisition for sampling frequency three or four times greater. They allowed optimization of the design before deployment of seven or eight FEBs for several months of continuous tests in the engineering array.

  20. Systolic array IC for genetic computation

    NASA Technical Reports Server (NTRS)

    Anderson, D.

    1991-01-01

    Measuring similarities between large sequences of genetic information is a formidable task requiring enormous amounts of computer time. Geneticists claim that nearly two months of CRAY-2 time are required to run a single comparison of the known database against the new bases that will be found this year, and more than a CRAY-2 year for next year's genetic discoveries, and so on. The DNA IC, designed at HP-ICBD in cooperation with the California Institute of Technology and the Jet Propulsion Laboratory, is being implemented in order to move the task of genetic comparison onto workstations and personal computers, while vastly improving performance. The chip is a systolic (pumped) array comprised of 16 processors, control logic, and global RAM, totaling 400,000 FETS. At 12 MHz, each chip performs 2.7 billion 16 bit operations per second. Using 35 of these chips in series on one PC board (performing nearly 100 billion operations per second), a sequence of 560 bases can be compared against the eventual total genome of 3 billion bases, in minutes--on a personal computer. While the designed purpose of the DNA chip is for genetic research, other disciplines requiring similarity measurements between strings of 7 bit encoded data could make use of this chip as well. Cryptography and speech recognition are two examples. A mix of full custom design and standard cells, in CMOS34, were used to achieve these goals. Innovative test methods were developed to enhance controllability and observability in the array. This paper describes these techniques as well as the chip's functionality. This chip was designed in the 1989-90 timeframe.

  1. SpectraCAM SPM: a camera system with high dynamic range for scientific and medical applications

    NASA Astrophysics Data System (ADS)

    Bhaskaran, S.; Baiko, D.; Lungu, G.; Pilon, M.; VanGorden, S.

    2005-08-01

    A scientific camera system having high dynamic range designed and manufactured by Thermo Electron for scientific and medical applications is presented. The newly developed CID820 image sensor with preamplifier-per-pixel technology is employed in this camera system. The 4 Mega-pixel imaging sensor has a raw dynamic range of 82dB. Each high-transparent pixel is based on a preamplifier-per-pixel architecture and contains two photogates for non-destructive readout of the photon-generated charge (NDRO). Readout is achieved via parallel row processing with on-chip correlated double sampling (CDS). The imager is capable of true random pixel access with a maximum operating speed of 4MHz. The camera controller consists of a custom camera signal processor (CSP) with an integrated 16-bit A/D converter and a PowerPC-based CPU running a Linux embedded operating system. The imager is cooled to -40C via three-stage cooler to minimize dark current. The camera housing is sealed and is designed to maintain the CID820 imager in the evacuated chamber for at least 5 years. Thermo Electron has also developed custom software and firmware to drive the SpectraCAM SPM camera. Included in this firmware package is the new Extreme DRTM algorithm that is designed to extend the effective dynamic range of the camera by several orders of magnitude up to 32-bit dynamic range. The RACID Exposure graphical user interface image analysis software runs on a standard PC that is connected to the camera via Gigabit Ethernet.

  2. Edge smoothing for real-time simulation of a polygon face object system as viewed by a moving observer

    NASA Technical Reports Server (NTRS)

    Lotz, Robert W. (Inventor); Westerman, David J. (Inventor)

    1980-01-01

    The visual system within an aircraft flight simulation system receives flight data and terrain data which is formated into a buffer memory. The image data is forwarded to an image processor which translates the image data into face vertex vectors Vf, defining the position relationship between the vertices of each terrain object and the aircraft. The image processor then rotates, clips, and projects the image data into two-dimensional display vectors (Vd). A display generator receives the Vd faces, and other image data to provide analog inputs to CRT devices which provide the window displays for the simulated aircraft. The video signal to the CRT devices passes through an edge smoothing device which prolongs the rise time (and fall time) of the video data inversely as the slope of the edge being smoothed. An operational amplifier within the edge smoothing device has a plurality of independently selectable feedback capacitors each having a different value. The values of the capacitors form a series which doubles as a power of two. Each feedback capacitor has a fast switch responsive to the corresponding bit of a digital binary control word for selecting (1) or not selecting (0) that capacitor. The control word is determined by the slope of each edge. The resulting actual feedback capacitance for each edge is the sum of all the selected capacitors and is directly proportional to the value of the binary control word. The output rise time (or fall time) is a function of the feedback capacitance, and is controlled by the slope through the binary control word.

  3. AAO2: a general purpose CCD controller for the AAT

    NASA Astrophysics Data System (ADS)

    Waller, Lew; Barton, John; Mayfield, Don; Griesbach, Jason

    2004-09-01

    The Anglo-Australian Observatory has developed a 2nd generation optical CCD controller to replace an earlier controller used now for almost twenty years. The new AAO2 controller builds on the considerable experience gained with the first controller, the new technologies now available and the techniques developed and successfully implemented in AAO's IRIS2 detector controller. The AAO2 controller has been designed to operate a wide variety of detectors and to achieve as near to detector limited performance as possible. It is capable of reading out CCDs with one, two or four output amplifiers, each output having its own video processor and high speed 16-bit ADC. The video processor is a correlated double sampler that may be switched between low noise dual slope integration or high speed clamp and sample modes. Programmable features include low noise DAC biases, horizontal clocks with DAC controllable levels and slopes and vertical clocks with DAC controllable arbitrary waveshapes. The controller uses two DSPs; one for overall control and the other for clock signal generation, which is highly programmable, with downloadable sequences of waveform patterns. The controller incorporates a precision detector temperature controller and provides accurate exposure time control. Telemetry is provided of all DAC generated voltages, many derived voltages, power supply voltages, detector temperature and detector identification. A high speed, full duplex fibre optic interface connects the controller to a host computer. The modular design uses six to ten circuit boards, plugged in to common backplanes. Two backplanes separate noisy digital signals from low noise analog signals.

  4. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

    PubMed Central

    2011-01-01

    Background The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. Results A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Conclusions Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance. PMID:21631914

  5. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation.

    PubMed

    Rognes, Torbjørn

    2011-06-01

    The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation. A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from http://dna.uio.no/swipe/ under the GNU Affero General Public License. Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.

  6. Acceleration of linear stationary iterative processes in multiprocessor computers. II

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Romm, Ya.E.

    1982-05-01

    For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less

  7. QI2S - Quick Image Interpretation System

    NASA Astrophysics Data System (ADS)

    Naghmouchi, Jamin; Aviely, Peleg; Ginosar, Ran; Ober, Giovanna; Bischoff, Ole; Nadler, Ron; Guiser, David; Citroen, Meira; Freddi, Riccardo; Berekovic, Mladen

    2015-09-01

    The evolution of the Earth Observation mission will be driven by many factors, and the deveploment of new processing paradigms to facilitate data downlink, handling and storage will be a key factor. Next generation EO satellites will generate a great amount of data at a very high data rate, both radar and optical. Real-time onboard processing can be the solution to reduce data downlink and management on ground. Radiometric, geometric, and atmospheric corrections of EO data as well as material/object detection in addition to the well-known needs for image compression and signal processing can be performed directly on board and the aim of QI2S project is to demonstrate this. QI2S, a concept prototype system for novel onboard image processing and image interpretation which has been designed, developed and validated in the framework of an EU FP7 project, targets these needs and makes a significant step towards exceeding current roadmaps of leading space agencies for future payload processors. The QI2S system features multiple chip components of the RC64, a novel rad-hard 64-core signal processing chip, which targets DSP performance of 75 GMACs (16bit), 150 GOPS and 38 single precision GFLOPS while dissipating less than 10 Watts. It integrates advanced DSP cores with a multibank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 3.125 Gbps full duplex high-speed serial links using SpaceFibre and other protocols. The processor is being developed within the European FP7 Framework Program and will be qualified to the highest space standards.

  8. Photon Counting Using Edge-Detection Algorithm

    NASA Technical Reports Server (NTRS)

    Gin, Jonathan W.; Nguyen, Danh H.; Farr, William H.

    2010-01-01

    New applications such as high-datarate, photon-starved, free-space optical communications require photon counting at flux rates into gigaphoton-per-second regimes coupled with subnanosecond timing accuracy. Current single-photon detectors that are capable of handling such operating conditions are designed in an array format and produce output pulses that span multiple sample times. In order to discern one pulse from another and not to overcount the number of incoming photons, a detection algorithm must be applied to the sampled detector output pulses. As flux rates increase, the ability to implement such a detection algorithm becomes difficult within a digital processor that may reside within a field-programmable gate array (FPGA). Systems have been developed and implemented to both characterize gigahertz bandwidth single-photon detectors, as well as process photon count signals at rates into gigaphotons per second in order to implement communications links at SCPPM (serial concatenated pulse position modulation) encoded data rates exceeding 100 megabits per second with efficiencies greater than two bits per detected photon. A hardware edge-detection algorithm and corresponding signal combining and deserialization hardware were developed to meet these requirements at sample rates up to 10 GHz. The photon discriminator deserializer hardware board accepts four inputs, which allows for the ability to take inputs from a quadphoton counting detector, to support requirements for optical tracking with a reduced number of hardware components. The four inputs are hardware leading-edge detected independently. After leading-edge detection, the resultant samples are ORed together prior to deserialization. The deserialization is performed to reduce the rate at which data is passed to a digital signal processor, perhaps residing within an FPGA. The hardware implements four separate analog inputs that are connected through RF connectors. Each analog input is fed to a high-speed 1-bit comparator, which digitizes the input referenced to an adjustable threshold value. This results in four independent serial sample streams of binary 1s and 0s, which are ORed together at rates up to 10 GHz. This single serial stream is then deserialized by a factor of 16 to create 16 signal lines at a rate of 622.5 MHz or lower for input to a high-speed digital processor assembly. The new design and corresponding hardware can be employed with a quad-photon counting detector capable of handling photon rates on the order of multi-gigaphotons per second, whereas prior art was only capable of handling a single input at 1/4 the flux rate. Additionally, the hardware edge-detection algorithm has provided the ability to process 3-10 higher photon flux rates than previously possible by removing the limitation that photoncounting detector output pulses on multiple channels being ORed not overlap. Now, only the leading edges of the pulses are required to not overlap. This new photon counting digitizer hardware architecture supports a universal front end for an optical communications receiver operating at data rates from kilobits to over one gigabit per second to meet increased mission data volume requirements.

  9. Embedded 32-bit Differential Pulse Voltammetry (DPV) Technique for 3-electrode Cell Sensing

    NASA Astrophysics Data System (ADS)

    N, Aqmar N. Z.; Abdullah, W. F. H.; Zain, Z. M.; Rani, S.

    2018-03-01

    This paper addresses the development of differential pulse voltammetry (DPV) embedded algorithm using an ARM cortex processor with new developed potentiostat circuit design for in-situ 3-electrode cell sensing. This project is mainly to design a low cost potentiostat for the researchers in laboratories. It is required to develop an embedded algorithm for analytical technique to be used with the designed potentiostat. DPV is one of the most familiar pulse technique method used with 3-electrode cell sensing in chemical studies. Experiment was conducted on 10mM solution of Ferricyanide using the designed potentiostat and the developed DPV algorithm. As a result, the device can generate an excitation signal of DPV from 0.4V to 1.2V and produced a peaked voltammogram with relatively small error compared to the commercial potentiostat; which is only 6.25% difference in peak potential reading. The design of potentiostat device and its DPV algorithm is verified.

  10. Demonstration of quantum superiority in learning parity with noise with superconducting qubits

    NASA Astrophysics Data System (ADS)

    Ristè, Diego; da Silva, Marcus; Ryan, Colm; Cross, Andrew; Smolin, John; Gambetta, Jay; Chow, Jerry; Johnson, Blake

    A problem in machine learning is to identify the function programmed in an unknown device, or oracle, having only access to its output. In particular, a parity function computes the parity of a subset of a bit register. We implement an oracle executing parity functions in a five-qubit superconducting processor and compare the performance of a classical and a quantum learner. The classical learner reads the output of multiple oracle calls and uses the results to infer the hidden function. In addition to querying the oracle, the quantum learner can apply coherent rotations on the output register before the readout. We show that, given a target success probability, the quantum approach outperforms the classical one in the number of queries needed. Moreover, this gap increases with readout noise and with the size of the qubit register. This result shows that quantum advantage can already emerge in current systems with a few, noisy qubits. We acknowledge support from IARPA under Contract W911NF-10-1-0324.

  11. Transportable GPU (General Processor Units) chip set technology for standard computer architectures

    NASA Astrophysics Data System (ADS)

    Fosdick, R. E.; Denison, H. C.

    1982-11-01

    The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.

  12. Simple Algorithms for Distributed Leader Election in Anonymous Synchronous Rings and Complete Networks Inspired by Neural Development in Fruit Flies.

    PubMed

    Xu, Lei; Jeavons, Peter

    2015-11-01

    Leader election in anonymous rings and complete networks is a very practical problem in distributed computing. Previous algorithms for this problem are generally designed for a classical message passing model where complex messages are exchanged. However, the need to send and receive complex messages makes such algorithms less practical for some real applications. We present some simple synchronous algorithms for distributed leader election in anonymous rings and complete networks that are inspired by the development of the neural system of the fruit fly. Our leader election algorithms all assume that only one-bit messages are broadcast by nodes in the network and processors are only able to distinguish between silence and the arrival of one or more messages. These restrictions allow implementations to use a simpler message-passing architecture. Even with these harsh restrictions our algorithms are shown to achieve good time and message complexity both analytically and experimentally.

  13. Army/NASA small turboshaft engine digital controls research program

    NASA Technical Reports Server (NTRS)

    Sellers, J. F.; Baez, A. N.

    1981-01-01

    The emphasis of a program to conduct digital controls research for small turboshaft engines is on engine test evaluation of advanced control logic using a flexible microprocessor based digital control system designed specifically for research on advanced control logic. Control software is stored in programmable memory. New control algorithms may be stored in a floppy disk and loaded directly into memory. This feature facilitates comparative evaluation of different advanced control modes. The central processor in the digital control is an Intel 8086 16 bit microprocessor. Control software is programmed in assembly language. Software checkout is accomplished prior to engine test by connecting the digital control to a real time hybrid computer simulation of the engine. The engine currently installed in the facility has a hydromechanical control modified to allow electrohydraulic fuel metering and VG actuation by the digital control. Simulation results are presented which show that the modern control reduces the transient rotor speed droop caused by unanticipated load changes such as cyclic pitch or wind gust transients.

  14. FLASH fly-by-light flight control demonstration results overview

    NASA Astrophysics Data System (ADS)

    Halski, Don J.

    1996-10-01

    The Fly-By-Light Advanced Systems Hardware (FLASH) program developed Fly-By-Light (FBL) and Power-By-Wire (PBW) technologies for military and commercial aircraft. FLASH consists of three tasks. Task 1 developed the fiber optic cable, connectors, testers and installation and maintenance procedures. Task 3 developed advanced smart, rotary thin wing and electro-hydrostatic (EHA) actuators. Task 2, which is the subject of this paper,l focused on integration of fiber optic sensors and data buses with cable plant components from Task 1 and actuators from Task 3 into centralized and distributed flight control systems. Both open loop and piloted hardware-in-the-loop demonstrations were conducted with centralized and distributed flight control architectures incorporating the AS-1773A optical bus, active hand controllers, optical sensors, optimal flight control laws in high speed 32-bit processors, and neural networks for EHA monitoring and fault diagnosis. This paper overviews the systems level testing conducted under the FLASH Flight Control task. Preliminary results are summarized. Companion papers provide additional information.

  15. Millisecond precision psychological research in a world of commodity computers: new hardware, new problems?

    PubMed

    Plant, Richard R; Turner, Garry

    2009-08-01

    Since the publication of Plant, Hammond, and Turner (2004), which highlighted a pressing need for researchers to pay more attention to sources of error in computer-based experiments, the landscape has undoubtedly changed, but not necessarily for the better. Readily available hardware has improved in terms of raw speed; multi core processors abound; graphics cards now have hundreds of megabytes of RAM; main memory is measured in gigabytes; drive space is measured in terabytes; ever larger thin film transistor displays capable of single-digit response times, together with newer Digital Light Processing multimedia projectors, enable much greater graphic complexity; and new 64-bit operating systems, such as Microsoft Vista, are now commonplace. However, have millisecond-accurate presentation and response timing improved, and will they ever be available in commodity computers and peripherals? In the present article, we used a Black Box ToolKit to measure the variability in timing characteristics of hardware used commonly in psychological research.

  16. Measurement of fault latency in a digital avionic miniprocessor

    NASA Technical Reports Server (NTRS)

    Mcgough, J. G.; Swern, F. L.

    1981-01-01

    The results of fault injection experiments utilizing a gate-level emulation of the central processor unit of the Bendix BDX-930 digital computer are presented. The failure detection coverage of comparison-monitoring and a typical avionics CPU self-test program was determined. The specific tasks and experiments included: (1) inject randomly selected gate-level and pin-level faults and emulate six software programs using comparison-monitoring to detect the faults; (2) based upon the derived empirical data develop and validate a model of fault latency that will forecast a software program's detecting ability; (3) given a typical avionics self-test program, inject randomly selected faults at both the gate-level and pin-level and determine the proportion of faults detected; (4) determine why faults were undetected; (5) recommend how the emulation can be extended to multiprocessor systems such as SIFT; and (6) determine the proportion of faults detected by a uniprocessor BIT (built-in-test) irrespective of self-test.

  17. Fast physical-random number generation using laser diode's frequency noise: influence of frequency discriminator

    NASA Astrophysics Data System (ADS)

    Matsumoto, Kouhei; Kasuya, Yuki; Yumoto, Mitsuki; Arai, Hideaki; Sato, Takashi; Sakamoto, Shuichi; Ohkawa, Masashi; Ohdaira, Yasuo

    2018-02-01

    Not so long ago, pseudo random numbers generated by numerical formulae were considered to be adequate for encrypting important data-files, because of the time needed to decode them. With today's ultra high-speed processors, however, this is no longer true. So, in order to thwart ever-more advanced attempts to breach our system's protections, cryptologists have devised a method that is considered to be virtually impossible to decode, and uses what is a limitless number of physical random numbers. This research describes a method, whereby laser diode's frequency noise generate a large quantities of physical random numbers. Using two types of photo detectors (APD and PIN-PD), we tested the abilities of two types of lasers (FP-LD and VCSEL) to generate random numbers. In all instances, an etalon served as frequency discriminator, the examination pass rates were determined using NIST FIPS140-2 test at each bit, and the Random Number Generation (RNG) speed was noted.

  18. Applications Performance on NAS Intel Paragon XP/S - 15#

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

    1994-01-01

    The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran coded. We found that the measured performance of assembly-coded BLAS is much less than what memory bandwidth limitation would predict. The influence of data cache on different sizes of vectors is also investigated using one-dimensional FFTs. c. Impact of processor layout: There are several different ways processors can be laid out within the two-dimensional grid of processors on the Paragon. We have used the FFT example to investigate performance differences based on processors layout.

  19. A parallel algorithm for 2D visco-acoustic frequency-domain full-waveform inversion: application to a dense OBS data set

    NASA Astrophysics Data System (ADS)

    Sourbier, F.; Operto, S.; Virieux, J.

    2006-12-01

    We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.

  20. Distributed processor allocation for launching applications in a massively connected processors complex

    DOEpatents

    Pedretti, Kevin

    2008-11-18

    A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.

  1. Design and implementation of low complexity wake-up receiver for underwater acoustic sensor networks

    NASA Astrophysics Data System (ADS)

    Yue, Ming

    This thesis designs a low-complexity dual Pseudorandom Noise (PN) scheme for identity (ID) detection and coarse frame synchronization. The two PN sequences for a node are identical and are separated by a specified length of gap which serves as the ID of different sensor nodes. The dual PN sequences are short in length but are capable of combating severe underwater acoustic (UWA) multipath fading channels that exhibit time varying impulse responses up to 100 taps. The receiver ID detection is implemented on a microcontroller MSP430F5529 by calculating the correlation between the two segments of the PN sequence with the specified separation gap. When the gap length is matched, the correlator outputs a peak which triggers the wake-up enable. The time index of the correlator peak is used as the coarse synchronization of the data frame. The correlator is implemented by an iterative algorithm that uses only one multiplication and two additions for each sample input regardless of the length of the PN sequence, thus achieving low computational complexity. The real-time processing requirement is also met via direct memory access (DMA) and two circular buffers to accelerate data transfer between the peripherals and the memory. The proposed dual PN detection scheme has been successfully tested by simulated fading channels and real-world measured channels. The results show that, in long multipath channels with more than 60 taps, the proposed scheme achieves high detection rate and low false alarm rate using maximal-length sequences as short as 31 bits to 127 bits, therefore it is suitable as a low-power wake-up receiver. The future research will integrate the wake-up receiver with Digital Signal Processors (DSP) for payload detection.

  2. MPEG-4 ASP SoC receiver with novel image enhancement techniques for DAB networks

    NASA Astrophysics Data System (ADS)

    Barreto, D.; Quintana, A.; García, L.; Callicó, G. M.; Núñez, A.

    2007-05-01

    This paper presents a system for real-time video reception in low-power mobile devices using Digital Audio Broadcast (DAB) technology for transmission. A demo receiver terminal is designed into a FPGA platform using the Advanced Simple Profile (ASP) MPEG-4 standard for video decoding. In order to keep the demanding DAB requirements, the bandwidth of the encoded sequence must be drastically reduced. In this sense, prior to the MPEG-4 coding stage, a pre-processing stage is performed. It is firstly composed by a segmentation phase according to motion and texture based on the Principal Component Analysis (PCA) of the input video sequence, and secondly by a down-sampling phase, which depends on the segmentation results. As a result of the segmentation task, a set of texture and motion maps are obtained. These motion and texture maps are also included into the bit-stream as user data side-information and are therefore known to the receiver. For all bit-rates, the whole encoder/decoder system proposed in this paper exhibits higher image visual quality than the alternative encoding/decoding method, assuming equal image sizes. A complete analysis of both techniques has also been performed to provide the optimum motion and texture maps for the global system, which has been finally validated for a variety of video sequences. Additionally, an optimal HW/SW partition for the MPEG-4 decoder has been studied and implemented over a Programmable Logic Device with an embedded ARM9 processor. Simulation results show that a throughput of 15 QCIF frames per second can be achieved with low area and low power implementation.

  3. An efficient system for reliably transmitting image and video data over low bit rate noisy channels

    NASA Technical Reports Server (NTRS)

    Costello, Daniel J., Jr.; Huang, Y. F.; Stevenson, Robert L.

    1994-01-01

    This research project is intended to develop an efficient system for reliably transmitting image and video data over low bit rate noisy channels. The basic ideas behind the proposed approach are the following: employ statistical-based image modeling to facilitate pre- and post-processing and error detection, use spare redundancy that the source compression did not remove to add robustness, and implement coded modulation to improve bandwidth efficiency and noise rejection. Over the last six months, progress has been made on various aspects of the project. Through our studies of the integrated system, a list-based iterative Trellis decoder has been developed. The decoder accepts feedback from a post-processor which can detect channel errors in the reconstructed image. The error detection is based on the Huber Markov random field image model for the compressed image. The compression scheme used here is that of JPEG (Joint Photographic Experts Group). Experiments were performed and the results are quite encouraging. The principal ideas here are extendable to other compression techniques. In addition, research was also performed on unequal error protection channel coding, subband vector quantization as a means of source coding, and post processing for reducing coding artifacts. Our studies on unequal error protection (UEP) coding for image transmission focused on examining the properties of the UEP capabilities of convolutional codes. The investigation of subband vector quantization employed a wavelet transform with special emphasis on exploiting interband redundancy. The outcome of this investigation included the development of three algorithms for subband vector quantization. The reduction of transform coding artifacts was studied with the aid of a non-Gaussian Markov random field model. This results in improved image decompression. These studies are summarized and the technical papers included in the appendices.

  4. Radiation hardened microprocessor for small payloads

    NASA Technical Reports Server (NTRS)

    Shah, Ravi

    1993-01-01

    The RH-3000 program is developing a rad-hard space qualified 32-bit MIPS R-3000 RISC processor under the Naval Research Lab sponsorship. In addition, under IR&D Harris is developing RHC-3000 for embedded control applications where low cost and radiation tolerance are primary concerns. The development program leverages heavily from commercial development of the MIPS R-3000. The commercial R-3000 has a large installed user base and several foundry partners are currently producing a wide variety of R-3000 derivative products. One of the MIPS derivative products, the LR33000 from LSI Logic, was used as the basis for the design of the RH-3000 chipset. The RH-3000 chipset consists of three core chips and two support chips. The core chips include the CPU, which is the R-3000 integer unit and the FPA/MD chip pair, which performs the R-3010 floating point functions. The two support whips contain all the support functions required for fault tolerance support, real-time support, memory management, timers, and other functions. The Harris development effort had first passed silicon success in June, 1992 with the first rad-hard 32-bit RH-3000 CPU chip. The CPU device is 30 kgates, has a 508 mil by 503 mil die size and is fabricated at Harris Semiconductor on the rad-hard CMOS Silicon on Sapphire (SOS) process. The CPU device successfully passed tesing against 600,000 test vectors derived directly on the LSI/MIPS test suite and has been operational as a single board computer running C code for the past year. In addition, the RH-3000 program has developed the methodology for converting commercially developed designs utilizing logic synthesis techniques based on a combination of VHDK and schematic data bases.

  5. Principles of Faithful Execution in the implementation of trusted objects.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tarman, Thomas David; Campbell, Philip LaRoche; Pierson, Lyndon George

    2003-09-01

    We begin with the following definitions: Definition: A trusted volume is the computing machinery (including communication lines) within which data is assumed to be physically protected from an adversary. A trusted volume provides both integrity and privacy. Definition: Program integrity consists of the protection necessary to enable the detection of changes in the bits comprising a program as specified by the developer, for the entire time that the program is outside a trusted volume. For ease of discussion we consider program integrity to be the aggregation of two elements: instruction integrity (detection of changes in the bits within an instructionmore » or block of instructions), and sequence integrity (detection of changes in the locations of instructions within a program). Definition: Faithful Execution (FE) is a type of software protection that begins when the software leaves the control of the developer and ends within the trusted volume of a target processor. That is, FE provides program integrity, even while the program is in execution. (As we will show below, FE schemes are a function of trusted volume size.) FE is a necessary quality for computing. Without it we cannot trust computations. In the early days of computing FE came for free since the software never left a trusted volume. At that time the execution environment was the same as the development environment. In some circles that environment was referred to as a ''closed shop:'' all of the software that was used there was developed there. When an organization bought a large computer from a vendor the organization would run its own operating system on that computer, use only its own editors, only its own compilers, only its own debuggers, and so on. However, with the continuing maturity of computing technology, FE becomes increasingly difficult to achieve« less

  6. Servomotor and Controller Having Large Dynamic Range

    NASA Technical Reports Server (NTRS)

    Alhorn, Dean C.; Howard, David E.; Smith, Dennis A.; Dutton, Ken; Paulson, M. Scott

    2007-01-01

    A recently developed micro-commanding rotational-position-control system offers advantages of less mechanical complexity, less susceptibility to mechanical resonances, less power demand, less bulk, less weight, and lower cost, relative to prior rotational-position-control systems based on stepping motors and gear drives. This system includes a digital-signal- processor (DSP)-based electronic controller, plus a shaft-angle resolver and a servomotor mounted on the same shaft. Heretofore, micro-stepping has usually been associated with stepping motors, but in this system, the servomotor is micro-commanded in response to rotational-position feedback from the shaft-angle resolver. The shaft-angle resolver is of a four-speed type chosen because it affords four times the resolution of a single-speed resolver. A key innovative aspect of this system is its position-feedback signal- conditioning circuits, which condition the resolver output signal for multiple ranges of rotational speed. In the preferred version of the system, two rotational- speed ranges are included, but any number of ranges could be added to expand the speed range or increase resolution in particular ranges. In the preferred version, the resolver output is conditioned with two resolver-to-digital converters (RDCs). One RDC is used for speeds from 0.00012 to 2.5 rpm; the other RDC is used for speeds of 2.5 to 6,000 rpm. For the lower speed range, the number of discrete steps of RDC output per revolution was set at 262,144 (4 quadrants at 16 bits per quadrant). For the higher speed range, the number of discrete steps per revolution was set at 4,096 (4 quadrants at 10 bits per quadrant).

  7. An orthogonal wavelet division multiple-access processor architecture for LTE-advanced wireless/radio-over-fiber systems over heterogeneous networks

    NASA Astrophysics Data System (ADS)

    Mahapatra, Chinmaya; Leung, Victor CM; Stouraitis, Thanos

    2014-12-01

    The increase in internet traffic, number of users, and availability of mobile devices poses a challenge to wireless technologies. In long-term evolution (LTE) advanced system, heterogeneous networks (HetNet) using centralized coordinated multipoint (CoMP) transmitting radio over optical fibers (LTE A-ROF) have provided a feasible way of satisfying user demands. In this paper, an orthogonal wavelet division multiple-access (OWDMA) processor architecture is proposed, which is shown to be better suited to LTE advanced systems as compared to orthogonal frequency division multiple access (OFDMA) as in LTE systems 3GPP rel.8 (3GPP, http://www.3gpp.org/DynaReport/36300.htm). ROF systems are a viable alternative to satisfy large data demands; hence, the performance in ROF systems is also evaluated. To validate the architecture, the circuit is designed and synthesized on a Xilinx vertex-6 field-programmable gate array (FPGA). The synthesis results show that the circuit performs with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for transform size of 512. A pipelined version of the architecture reduces the power consumption by approximately 89%. We compare our architecture with similar available architectures for resource utilization and timing and provide performance comparison with OFDMA systems for various quality metrics of communication systems. The OWDMA architecture is found to perform better than OFDMA for bit error rate (BER) performance versus signal-to-noise ratio (SNR) in wireless channel as well as ROF media. It also gives higher throughput and mitigates the bad effect of peak-to-average-power ratio (PAPR).

  8. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

    PubMed

    Daily, Jeff

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  9. Compact Low Power DPU for Plasma Instrument LINA on the Russian Luna-Glob Lander

    NASA Astrophysics Data System (ADS)

    Schmidt, Walter; Riihelä, Pekka; Kallio, Esa

    2013-04-01

    The Swedish Institute for Space Physics in Kiruna is bilding a Lunar Ions and Neutrals Analyzer (LINA) for the Russian Luna-Glob lander mission and its orbiter, to be launched around 2016 [1]. The Finnish Meteorological Institute is responsible for designing and building the central data processing units (DPU) for both instruments. The design details were optimized to serve as demonstrator also for a similar instrument on the Jupiter mission JUICE. To accommodate the originally set short development time and to keep the design between orbiter and Lander as similar as possible, the DPU is built around two re-programmable flash-based FPGAs from Actel. One FPGA contains a public-domain 32-bit processor core identical for both Lander and orbiter. The other FPGA handles all interfaces to the spacecraft system and the detectors, somewhat different for both implementations. Monitoring of analog housekeeping data is implemented as an IP-core from Stellamar inside the interface FPGA, saving mass, volume and especially power while simplifying the radiation protection design. As especially on the Lander the data retention before transfer to the orbiter cannot be guaranteed under all conditions, the DPU includes a Flash-PROM containing several software versions and data storage capability. With the memory management implemented inside the interface FPGA, one of the serial links can also be used as test port to verify the system, load the initial software into the Flash-PROM and to control the detector hardware directly without support by the processor and a ready developed operating system and software. Implementation and performance details will be presented. Reference: [1] http://www.russianspaceweb.com/luna_glob_lander.html.

  10. 32-Bit-Wide Memory Tolerates Failures

    NASA Technical Reports Server (NTRS)

    Buskirk, Glenn A.

    1990-01-01

    Electronic memory system of 32-bit words corrects bit errors caused by some common type of failures - even failure of entire 4-bit-wide random-access-memory (RAM) chip. Detects failure of two such chips, so user warned that ouput of memory may contain errors. Includes eight 4-bit-wide DRAM's configured so each bit of each DRAM assigned to different one of four parallel 8-bit words. Each DRAM contributes only 1 bit to each 8-bit word.

  11. Effect of PDC bit design and confining pressure on bit-balling tendencies while drilling shale using water base mud

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hariharan, P.R.; Azar, J.J.

    1996-09-01

    A good majority of all oilwell drilling occurs in shale and other clay-bearing rocks. In the light of relatively fewer studies conducted, the problem of bit-balling in PDC bits while drilling shale has been addressed with the primary intention of attempting to quantify the degree of balling, as well as to investigate the influence of bit design and confining pressures. A series of full-scale laboratory drilling tests under simulated down hole conditions were conducted utilizing seven different PDC bits in Catoosa shale. Test results have indicated that the non-dimensional parameter R{sub d} [(bit torque).(weight-on-bit)/(bit diameter)] is a good indicator ofmore » the degree of bit-balling and that it correlated well with Specific-Energy. Furthermore, test results have shown bit-profile and bit-hydraulic design to be key parameters of bit design that dictate the tendency of balling in shales under a given set of operating conditions. A bladed bit was noticed to ball less compared to a ribbed or open-faced bit. Likewise, related to bit profile, test results have indicated that the parabolic profile has a lesser tendency to ball compared to round and flat profiles. The tendency of PDC bits to ball was noticed to increase with increasing confining pressures for the set of drilling conditions used.« less

  12. High-performance floating-point image computing workstation for medical applications

    NASA Astrophysics Data System (ADS)

    Mills, Karl S.; Wong, Gilman K.; Kim, Yongmin

    1990-07-01

    The medical imaging field relies increasingly on imaging and graphics techniques in diverse applications with needs similar to (or more stringent than) those of the military, industrial and scientific communities. However, most image processing and graphics systems available for use in medical imaging today are either expensive, specialized, or in most cases both. High performance imaging and graphics workstations which can provide real-time results for a number of applications, while maintaining affordability and flexibility, can facilitate the application of digital image computing techniques in many different areas. This paper describes the hardware and software architecture of a medium-cost floating-point image processing and display subsystem for the NeXT computer, and its applications as a medical imaging workstation. Medical imaging applications of the workstation include use in a Picture Archiving and Communications System (PACS), in multimodal image processing and 3-D graphics workstation for a broad range of imaging modalities, and as an electronic alternator utilizing its multiple monitor display capability and large and fast frame buffer. The subsystem provides a 2048 x 2048 x 32-bit frame buffer (16 Mbytes of image storage) and supports both 8-bit gray scale and 32-bit true color images. When used to display 8-bit gray scale images, up to four different 256-color palettes may be used for each of four 2K x 2K x 8-bit image frames. Three of these image frames can be used simultaneously to provide pixel selectable region of interest display. A 1280 x 1024 pixel screen with 1: 1 aspect ratio can be windowed into the frame buffer for display of any portion of the processed image or images. In addition, the system provides hardware support for integer zoom and an 82-color cursor. This subsystem is implemented on an add-in board occupying a single slot in the NeXT computer. Up to three boards may be added to the NeXT for multiple display capability (e.g., three 1280 x 1024 monitors, each with a 16-Mbyte frame buffer). Each add-in board provides an expansion connector to which an optional image computing coprocessor board may be added. Each coprocessor board supports up to four processors for a peak performance of 160 MFLOPS. The coprocessors can execute programs from external high-speed microcode memory as well as built-in internal microcode routines. The internal microcode routines provide support for 2-D and 3-D graphics operations, matrix and vector arithmetic, and image processing in integer, IEEE single-precision floating point, or IEEE double-precision floating point. In addition to providing a library of C functions which links the NeXT computer to the add-in board and supports its various operational modes, algorithms and medical imaging application programs are being developed and implemented for image display and enhancement. As an extension to the built-in algorithms of the coprocessors, 2-D Fast Fourier Transform (FF1), 2-D Inverse FFF, convolution, warping and other algorithms (e.g., Discrete Cosine Transform) which exploit the parallel architecture of the coprocessor board are being implemented.

  13. SpaceCube Version 1.5

    NASA Technical Reports Server (NTRS)

    Geist, Alessandro; Lin, Michael; Flatley, Tom; Petrick, David

    2013-01-01

    SpaceCube 1.5 is a high-performance and low-power system in a compact form factor. It is a hybrid processing system consisting of CPU (central processing unit), FPGA (field-programmable gate array), and DSP (digital signal processor) processing elements. The primary processing engine is the Virtex- 5 FX100T FPGA, which has two embedded processors. The SpaceCube 1.5 System was a bridge to the SpaceCube 2.0 and SpaceCube 2.0 Mini processing systems. The SpaceCube 1.5 system was the primary avionics in the successful SMART (Small Rocket/Spacecraft Technology) Sounding Rocket mission that was launched in the summer of 2011. For SMART and similar missions, an avionics processor is required that is reconfigurable, has high processing capability, has multi-gigabit interfaces, is low power, and comes in a rugged/compact form factor. The original SpaceCube 1.0 met a number of the criteria, but did not possess the multi-gigabit interfaces that were required and is a higher-cost system. The SpaceCube 1.5 was designed with those mission requirements in mind. The SpaceCube 1.5 features one Xilinx Virtex-5 FX100T FPGA and has excellent size, weight, and power characteristics [4×4×3 in. (approx. = 10×10×8 cm), 3 lb (approx. = 1.4 kg), and 5 to 15 W depending on the application]. The estimated computing power of the two PowerPC 440s in the Virtex-5 FPGA is 1100 DMIPS each. The SpaceCube 1.5 includes two Gigabit Ethernet (1 Gbps) interfaces as well as two SATA-I/II interfaces (1.5 to 3.0 Gbps) for recording to data drives. The SpaceCube 1.5 also features DDR2 SDRAM (double data rate synchronous dynamic random access memory); 4- Gbit Flash for storing application code for the CPU, FPGA, and DSP processing elements; and a Xilinx Platform Flash XL to store FPGA configuration files or application code. The system also incorporates a 12 bit analog to digital converter with the ability to read 32 discrete analog sensor inputs. The SpaceCube 1.5 design also has a built-in accelerometer. In addition, the system has 12 receive and transmit RS- 422 interfaces for legacy support. The SpaceCube 1.5 processor card represents the first NASA Goddard design in a compact form factor featuring the Xilinx Virtex- 5. The SpaceCube 1.5 incorporates backward compatibility with the Space- Cube 1.0 form factor and stackable architecture. It also makes use of low-cost commercial parts, but is designed for operation in harsh environments.

  14. Simulation of continuously logical base cells (CL BC) with advanced functions for analog-to-digital converters and image processors

    NASA Astrophysics Data System (ADS)

    Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.

    2017-10-01

    The paper considers results of design and modeling of continuously logical base cells (CL BC) based on current mirrors (CM) with functions of preliminary analogue and subsequent analogue-digital processing for creating sensor multichannel analog-to-digital converters (SMC ADCs) and image processors (IP). For such with vector or matrix parallel inputs-outputs IP and SMC ADCs it is needed active basic photosensitive cells with an extended electronic circuit, which are considered in paper. Such basic cells and ADCs based on them have a number of advantages: high speed and reliability, simplicity, small power consumption, high integration level for linear and matrix structures. We show design of the CL BC and ADC of photocurrents and their various possible implementations and its simulations. We consider CL BC for methods of selection and rank preprocessing and linear array of ADCs with conversion to binary codes and Gray codes. In contrast to our previous works here we will dwell more on analogue preprocessing schemes for signals of neighboring cells. Let us show how the introduction of simple nodes based on current mirrors extends the range of functions performed by the image processor. Each channel of the structure consists of several digital-analog cells (DC) on 15-35 CMOS. The amount of DC does not exceed the number of digits of the formed code, and for an iteration type, only one cell of DC, complemented by the device of selection and holding (SHD), is required. One channel of ADC with iteration is based on one DC-(G) and SHD, and it has only 35 CMOS transistors. In such ADCs easily parallel code can be realized and also serial-parallel output code. The circuits and simulation results of their design with OrCAD are shown. The supply voltage of the DC is 1.8÷3.3V, the range of an input photocurrent is 0.1÷24μA, the transformation time is 20÷30nS at 6-8 bit binary or Gray codes. The general power consumption of the ADC with iteration is only 50÷100μW, if the maximum input current is 4μA. Such simple structure of linear array of ADCs with low power consumption and supply voltage 3.3V, and at the same time with good dynamic characteristics (frequency of digitization even for 1.5μm CMOS-technologies is 40÷50 MHz, and can be increased up to 10 times) and accuracy characteristics are show. The SMC ADCs based on CL BC and CM opens new prospects for realization of linear and matrix IP and photo-electronic structures with matrix operands, which are necessary for neural networks, digital optoelectronic processors, neural-fuzzy controllers.

  15. An Embedded Sensor Node Microcontroller with Crypto-Processors.

    PubMed

    Panić, Goran; Stecklina, Oliver; Stamenković, Zoran

    2016-04-27

    Wireless sensor network applications range from industrial automation and control, agricultural and environmental protection, to surveillance and medicine. In most applications, data are highly sensitive and must be protected from any type of attack and abuse. Security challenges in wireless sensor networks are mainly defined by the power and computing resources of sensor devices, memory size, quality of radio channels and susceptibility to physical capture. In this article, an embedded sensor node microcontroller designed to support sensor network applications with severe security demands is presented. It features a low power 16-bitprocessor core supported by a number of hardware accelerators designed to perform complex operations required by advanced crypto algorithms. The microcontroller integrates an embedded Flash and an 8-channel 12-bit analog-to-digital converter making it a good solution for low-power sensor nodes. The article discusses the most important security topics in wireless sensor networks and presents the architecture of the proposed hardware solution. Furthermore, it gives details on the chip implementation, verification and hardware evaluation. Finally, the chip power dissipation and performance figures are estimated and analyzed.

  16. The fiber-optic high-speed data bus for a new generation of military aircraft

    NASA Astrophysics Data System (ADS)

    Uhlhorn, Roger W.

    1991-02-01

    The avionic suite for the next generation of military aircraft is being designed with component and module commonality in mind in order to control recurring costs and capitalize on economy of scale. The backbone of the suite fashioned out of these modular building blocks is the fiber-optic bit-serial time-division multiplexed high-speed data bus (HSDB), operating at 50 Mb/s, which provides command and control communications among most of the aircraft subsystems and can be used to provide communications for a fly-by-light flight-control system or for the block transfer of data between mass memories and data processors. The fiber-optic HSDB is examined from the top down, beginning with an overview of the evolution of avionic architectures. A review is given of the standardization activity associated with development of the network, the protocols chosen to implement the desired communication functions, configuration options, and the fiber-optic components used in the bus interfaces or other active nodes of the network. It is believed that the utility of the bus extends beyond aircraft to spacecraft, ships, and land vehicles.

  17. An experimental adaptive radar MTI filter

    NASA Astrophysics Data System (ADS)

    Gong, Y. H.; Cooling, J. E.

    The theoretical and practical features of a self-adaptive filter designed to remove clutter noise from a radar signal are described. The hardware employs an 8-bit microprocessor/fast hardware multiplier combination along with analog-digital and digital-analog interfaces. The software here is implemented in assembler language. It is assumed that there is little overlap between the signal and the noise spectra and that the noise power is much greater than that of the signal. It is noted that one of the most important factors to be considered when designing digital filters is the quantization noise. This works to degrade the steady state performance from that of the ideal (infinite word length) filter. The principal limitation of the filter described here is its low sampling rate (1.72 kHz), due mainly to the time spent on the multiplication routines. The methods discussed here, however, are general and can be applied to both traditional and more complex radar MTI systems, provided that the filter sampling frequency is increased. Dedicated VLSI signal processors are seen as holding considerable promise.

  18. Computer power fathoms the depths: billion-bit data processors illuminate the subsurface. [3-D Seismic techniques

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ross, J.J.

    Some of the same space-age signal technology being used to track events 200 miles above the earth is helping petroleum explorationists track down oil and natural gas two miles and more down into the earth. The breakthroughs, which have come in a technique called three-dimensional seismic work, could change the complexion of exploration for oil and natural gas. Thanks to this 3-D seismic approach, explorationists can make dynamic maps of sites miles beneath the surface. Then explorationists can throw these maps on space-age computer systems and manipulate them every which way - homing in sharply on salt domes, faults, sandsmore » and traps associated with oil and natural gas. ''The 3-D seismic scene has exploded within the last two years,'' says, Peiter Tackenberg, Marathon technical consultant who deals with both domestic and international exploration. The 3-D technique has been around for more than a decade, he notes, but recent achievements in space-age computer hardware and software have unlocked its full potential.« less

  19. New tracking implementation in the Deep Space Network

    NASA Technical Reports Server (NTRS)

    Berner, Jeff B.; Bryant, Scott H.

    2001-01-01

    As part of the Network Simplification Project, the tracking system of the Deep Space Network is being upgraded. This upgrade replaces the discrete logic sequential ranging system with a system that is based on commercial Digital Signal Processor boards. The new implementation allows both sequential and pseudo-noise types of ranging. The other major change is a modernization of the data formatting. Previously, there were several types of interfaces, delivering both intermediate data and processed data (called 'observables'). All of these interfaces were bit-packed blocks, which do not allow for easy expansion, and many of these interfaces required knowledge of the specific hardware implementations. The new interface supports four classes of data: raw (direct from the measuring equipment), derived (the observable data), interferometric (multiple antenna measurements), and filtered (data whose values depend on multiple measurements). All of the measurements are reported at the sky frequency or phase level, so that no knowledge of the actual hardware is required. The data is formatted into Standard Formatted Data Units, as defined by the Consultative Committee for Space Data Systems, so that expansion and cross-center usage is greatly enhanced.

  20. Pseudo-random generator based on Chinese Remainder Theorem

    NASA Astrophysics Data System (ADS)

    Bajard, Jean Claude; Hördegen, Heinrich

    2009-08-01

    Pseudo-Random Generators (PRG) are fundamental in cryptography. Their use occurs at different level in cipher protocols. They need to verify some properties for being qualified as robust. The NIST proposes some criteria and a tests suite which gives informations on the behavior of the PRG. In this work, we present a PRG constructed from the conversion between further residue systems of representation of the elements of GF(2)[X]. In this approach, we use some pairs of co-prime polynomials of degree k and a state vector of 2k bits. The algebraic properties are broken by using different independent pairs during the process. Since this method is reversible, we also can use it as a symmetric crypto-system. We evaluate the cost of a such system, taking into account that some operations are commonly implemented on crypto-processors. We give the results of the different NIST Tests and we explain this choice compare to others found in the literature. We describe the behavior of this PRG and explain how the different rounds are chained for ensuring a fine secure randomness.

  1. An Embedded Sensor Node Microcontroller with Crypto-Processors

    PubMed Central

    Panić, Goran; Stecklina, Oliver; Stamenković, Zoran

    2016-01-01

    Wireless sensor network applications range from industrial automation and control, agricultural and environmental protection, to surveillance and medicine. In most applications, data are highly sensitive and must be protected from any type of attack and abuse. Security challenges in wireless sensor networks are mainly defined by the power and computing resources of sensor devices, memory size, quality of radio channels and susceptibility to physical capture. In this article, an embedded sensor node microcontroller designed to support sensor network applications with severe security demands is presented. It features a low power 16-bitprocessor core supported by a number of hardware accelerators designed to perform complex operations required by advanced crypto algorithms. The microcontroller integrates an embedded Flash and an 8-channel 12-bit analog-to-digital converter making it a good solution for low-power sensor nodes. The article discusses the most important security topics in wireless sensor networks and presents the architecture of the proposed hardware solution. Furthermore, it gives details on the chip implementation, verification and hardware evaluation. Finally, the chip power dissipation and performance figures are estimated and analyzed. PMID:27128925

  2. Space qualification of an automotive microcontroller for the DREAMS-P/H pressure and humidity instrument on board the ExoMars 2016 Schiaparelli lander

    NASA Astrophysics Data System (ADS)

    Nikkanen, T.; Schmidt, W.; Harri, A.-M.; Genzer, M.; Hieta, M.; Haukka, H.; Kemppinen, O.

    2015-10-01

    Finnish Meteorological Institute (FMI) has developed a novel kind of pressure and humidity instrument for the Schiaparelli Mars lander, which is a part of the ExoMars 2016 mission of the European Space Agency (ESA) [1]. The DREAMS-P pressure instrument and DREAMS-H humidity instrument are part of the DREAMS science package on board the lander. DREAMS-P (seen in Fig. 1 and DREAMS-H were evolved from earlier planetary pressure and humidity instrument designs by FMI with a completely redesigned control and data unit. Instead of using the conventional approach of utilizing a space grade processor component, a commercial off the shelf microcontroller was selected for handling the pressure and humidity measurements. The new controller is based on the Freescale MC9S12XEP100 16-bit automotive microcontroller. Coordinated by FMI, a batch of these microcontroller units (MCUs) went through a custom qualification process in order to accept the component for spaceflight on board a Mars lander.

  3. Proper nozzle location, bit profile, and cutter arrangement affect PDC-bit performance significantly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garcia-Gavito, D.; Azar, J.J.

    1994-09-01

    During the past 20 years, the drilling industry has looked to new technology to halt the exponentially increasing costs of drilling oil, gas, and geothermal wells. This technology includes bit design innovations to improve overall drilling performance and reduce drilling costs. These innovations include development of drag bits that use PDC cutters, also called PDC bits, to drill long, continuous intervals of soft to medium-hard formations more economically than conventional three-cone roller-cone bits. The cost advantage is the result of higher rates of penetration (ROP's) and longer bit life obtained with the PDC bits. An experimental study comparing the effectsmore » of polycrystalline-diamond-compact (PDC)-bit design features on the dynamic pressure distribution at the bit/rock interface was conducted on a full-scale drilling rig. Results showed that nozzle location, bit profile, and cutter arrangement are significant factors in PDC-bit performance.« less

  4. Acquisition and Retaining Granular Samples via a Rotating Coring Bit

    NASA Technical Reports Server (NTRS)

    Bar-Cohen, Yoseph; Badescu, Mircea; Sherrit, Stewart

    2013-01-01

    This device takes advantage of the centrifugal forces that are generated when a coring bit is rotated, and a granular sample is entered into the bit while it is spinning, making it adhere to the internal wall of the bit, where it compacts itself into the wall of the bit. The bit can be specially designed to increase the effectiveness of regolith capturing while turning and penetrating the subsurface. The bit teeth can be oriented such that they direct the regolith toward the bit axis during the rotation of the bit. The bit can be designed with an internal flute that directs the regolith upward inside the bit. The use of both the teeth and flute can be implemented in the same bit. The bit can also be designed with an internal spiral into which the various particles wedge. In another implementation, the bit can be designed to collect regolith primarily from a specific depth. For that implementation, the bit can be designed such that when turning one way, the teeth guide the regolith outward of the bit and when turning in the opposite direction, the teeth will guide the regolith inward into the bit internal section. This mechanism can be implemented with or without an internal flute. The device is based on the use of a spinning coring bit (hollow interior) as a means of retaining granular sample, and the acquisition is done by inserting the bit into the subsurface of a regolith, soil, or powder. To demonstrate the concept, a commercial drill and a coring bit were used. The bit was turned and inserted into the soil that was contained in a bucket. While spinning the bit (at speeds of 600 to 700 RPM), the drill was lifted and the soil was retained inside the bit. To prove this point, the drill was turned horizontally, and the acquired soil was still inside the bit. The basic theory behind the process of retaining unconsolidated mass that can be acquired by the centrifugal forces of the bit is determined by noting that in order to stay inside the interior of the bit, the frictional force must be greater than the weight of the sample. The bit can be designed with an internal sleeve to serve as a container for granular samples. This tube-shaped component can be extracted upon completion of the sampling, and the bottom can be capped by placing the bit onto a corklike component. Then, upon removal of the internal tube, the top section can be sealed. The novel features of this device are: center dot A mechanism of acquiring and retaining granular samples using a coring bit without a closed door. center dot An acquisition bit that has internal structure such as a waffle pattern for compartmentalizing or helical internal flute to propel the sample inside the bit and help in acquiring and retaining granular samples. center dot A bit with an internal spiral into which the various particles wedge. center dot A design that provides a method of testing frictional properties of the granular samples and potentially segregating particles based on size and density. A controlled acceleration or deceleration may be used to drop the least-frictional particles or to eventually shear the unconsolidated material near the bit center.

  5. Effects of plastic bits on the condition and behaviour of captive-reared pheasants.

    PubMed

    Butler, D A; Davis, C

    2010-03-27

    Between 2005 and 2007, data were collected from game farms across England and Wales to examine the effects of the use of bits on the physiological condition and behaviour of pheasants. On each site, two pheasant pens kept in the same conditions were randomly allocated to either use bits or not. The behaviour and physiological conditions of pheasants in each treatment pen were assessed on the day of bitting and weekly thereafter until release. Detailed records of feed usage, medications and mortality were also kept. Bits halved the number of acts of bird-on-bird pecking, but they doubled the incidence of headshaking and scratching. Bits caused nostril inflammation and bill deformities in some birds, particularly after seven weeks of age. In all weeks after bitting, feather condition was poorer in non-bitted pheasants than in those fitted with bits. Less than 3 per cent of bitted birds had damaged skin, but in the non-bitted pens this figure increased over time to 23 per cent four weeks later. Feed use and mortality did not differ between bitted and non-bitted birds.

  6. New PDC bit design reduces vibrational problems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mensa-Wilmot, G.; Alexander, W.L.

    1995-05-22

    A new polycrystalline diamond compact (PDC) bit design combines cutter layout, load balancing, unsymmetrical blades and gauge pads, and spiraled blades to reduce problematic vibrations without limiting drilling efficiency. Stabilization improves drilling efficiency and also improves dull characteristics for PDC bits. Some PDC bit designs mitigate one vibrational mode (such as bit whirl) through drilling parameter manipulation yet cause or excite another vibrational mode (such as slip-stick). An alternative vibration-reducing concept which places no limitations on the operational environment of a PDC bit has been developed to ensure optimization of the bit`s available mechanical energy. The paper discusses bit stabilization,more » vibration reduction, vibration prevention, cutter arrangement, load balancing, blade layout, spiraled blades, and bit design.« less

  7. Efficiency of static core turn-off in a system-on-a-chip with variation

    DOEpatents

    Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong

    2013-10-29

    A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.

  8. Critique of a Hughes shuttle Ku-band data sampler/bit synchronizer

    NASA Technical Reports Server (NTRS)

    Holmes, J. K.

    1980-01-01

    An alternative bit synchronizer proposed for shuttle was analyzed in a noise-free environment by considering the basic operation of the loop via timing diagrams and by linearizing the bit synchronizer as an equivalent, continuous, phased-lock loop (PLL). The loop is composed of a high-frequency phase-frequency detector which is capable of detecting both phase and frequency errors and is used to track the clock, and a bit transition detector which attempts to track the transitions of the data bits. It was determined that the basic approach was a good design which, with proper implementation of the accumulator, up/down counter and logic should provide accurate mid-bit sampling with symmetric bits. However, when bit asymmetry occurs, the bit synchronizer can lock up with a large timing error, yet be quasi-stable (timing will not change unless the clock and bit sequence drift). This will result in incorrectly detecting some bits.

  9. Calibrating thermal behavior of electronics

    DOEpatents

    Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.

    2017-07-11

    A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.

  10. Calibrating thermal behavior of electronics

    DOEpatents

    Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.

    2016-05-31

    A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.

  11. Calibrating thermal behavior of electronics

    DOEpatents

    Chainer, Timothy J.; Parida, Pritish R.; Schultz, Mark D.

    2017-01-03

    A method includes determining a relationship between indirect thermal data for a processor and a measured temperature associated with the processor, during a calibration process, obtaining the indirect thermal data for the processor during actual operation of the processor, and determining an actual significant temperature associated with the processor during the actual operation using the indirect thermal data for the processor during actual operation of the processor and the relationship.

  12. Bit-1 is an essential regulator of myogenic differentiation

    PubMed Central

    Griffiths, Genevieve S.; Doe, Jinger; Jijiwa, Mayumi; Van Ry, Pam; Cruz, Vivian; de la Vega, Michelle; Ramos, Joe W.; Burkin, Dean J.; Matter, Michelle L.

    2015-01-01

    Muscle differentiation requires a complex signaling cascade that leads to the production of multinucleated myofibers. Genes regulating the intrinsic mitochondrial apoptotic pathway also function in controlling cell differentiation. How such signaling pathways are regulated during differentiation is not fully understood. Bit-1 (also known as PTRH2) mutations in humans cause infantile-onset multisystem disease with muscle weakness. We demonstrate here that Bit-1 controls skeletal myogenesis through a caspase-mediated signaling pathway. Bit-1-null mice exhibit a myopathy with hypotrophic myofibers. Bit-1-null myoblasts prematurely express muscle-specific proteins. Similarly, knockdown of Bit-1 expression in C2C12 myoblasts promotes early differentiation, whereas overexpression delays differentiation. In wild-type mice, Bit-1 levels increase during differentiation. Bit-1-null myoblasts exhibited increased levels of caspase 9 and caspase 3 without increased apoptosis. Bit-1 re-expression partially rescued differentiation. In Bit-1-null muscle, Bcl-2 levels are reduced, suggesting that Bcl-2-mediated inhibition of caspase 9 and caspase 3 is decreased. Bcl-2 re-expression rescued Bit-1-mediated early differentiation in Bit-1-null myoblasts and C2C12 cells with knockdown of Bit-1 expression. These results support an unanticipated yet essential role for Bit-1 in controlling myogenesis through regulation of Bcl-2. PMID:25770104

  13. Improved Iris Recognition through Fusion of Hamming Distance and Fragile Bit Distance.

    PubMed

    Hollingsworth, Karen P; Bowyer, Kevin W; Flynn, Patrick J

    2011-12-01

    The most common iris biometric algorithm represents the texture of an iris using a binary iris code. Not all bits in an iris code are equally consistent. A bit is deemed fragile if its value changes across iris codes created from different images of the same iris. Previous research has shown that iris recognition performance can be improved by masking these fragile bits. Rather than ignoring fragile bits completely, we consider what beneficial information can be obtained from the fragile bits. We find that the locations of fragile bits tend to be consistent across different iris codes of the same eye. We present a metric, called the fragile bit distance, which quantitatively measures the coincidence of the fragile bit patterns in two iris codes. We find that score fusion of fragile bit distance and Hamming distance works better for recognition than Hamming distance alone. To our knowledge, this is the first and only work to use the coincidence of fragile bit locations to improve the accuracy of matches.

  14. BIT BY BIT: A Game Simulating Natural Language Processing in Computers

    ERIC Educational Resources Information Center

    Kato, Taichi; Arakawa, Chuichi

    2008-01-01

    BIT BY BIT is an encryption game that is designed to improve students' understanding of natural language processing in computers. Participants encode clear words into binary code using an encryption key and exchange them in the game. BIT BY BIT enables participants who do not understand the concept of binary numbers to perform the process of…

  15. Bit selection using field drilling data and mathematical investigation

    NASA Astrophysics Data System (ADS)

    Momeni, M. S.; Ridha, S.; Hosseini, S. J.; Meyghani, B.; Emamian, S. S.

    2018-03-01

    A drilling process will not be complete without the usage of a drill bit. Therefore, bit selection is considered to be an important task in drilling optimization process. To select a bit is considered as an important issue in planning and designing a well. This is simply because the cost of drilling bit in total cost is quite high. Thus, to perform this task, aback propagation ANN Model is developed. This is done by training the model using several wells and it is done by the usage of drilling bit records from offset wells. In this project, two models are developed by the usage of the ANN. One is to find predicted IADC bit code and one is to find Predicted ROP. Stage 1 was to find the IADC bit code by using all the given filed data. The output is the Targeted IADC bit code. Stage 2 was to find the Predicted ROP values using the gained IADC bit code in Stage 1. Next is Stage 3 where the Predicted ROP value is used back again in the data set to gain Predicted IADC bit code value. The output is the Predicted IADC bit code. Thus, at the end, there are two models that give the Predicted ROP values and Predicted IADC bit code values.

  16. Investigation of PDC bit failure base on stick-slip vibration analysis of drilling string system plus drill bit

    NASA Astrophysics Data System (ADS)

    Huang, Zhiqiang; Xie, Dou; Xie, Bing; Zhang, Wenlin; Zhang, Fuxiao; He, Lei

    2018-03-01

    The undesired stick-slip vibration is the main source of PDC bit failure, such as tooth fracture and tooth loss. So, the study of PDC bit failure base on stick-slip vibration analysis is crucial to prolonging the service life of PDC bit and improving ROP (rate of penetration). For this purpose, a piecewise-smooth torsional model with 4-DOF (degree of freedom) of drilling string system plus PDC bit is proposed to simulate non-impact drilling. In this model, both the friction and cutting behaviors of PDC bit are innovatively introduced. The results reveal that PDC bit is easier to fail than other drilling tools due to the severer stick-slip vibration. Moreover, reducing WOB (weight on bit) and improving driving torque can effectively mitigate the stick-slip vibration of PDC bit. Therefore, PDC bit failure can be alleviated by optimizing drilling parameters. In addition, a new 4-DOF torsional model is established to simulate torsional impact drilling and the effect of torsional impact on PDC bit's stick-slip vibration is analyzed by use of an engineering example. It can be concluded that torsional impact can mitigate stick-slip vibration, prolonging the service life of PDC bit and improving drilling efficiency, which is consistent with the field experiment results.

  17. Methods and systems for providing reconfigurable and recoverable computing resources

    NASA Technical Reports Server (NTRS)

    Stange, Kent (Inventor); Hess, Richard (Inventor); Kelley, Gerald B (Inventor); Rogers, Randy (Inventor)

    2010-01-01

    A method for optimizing the use of digital computing resources to achieve reliability and availability of the computing resources is disclosed. The method comprises providing one or more processors with a recovery mechanism, the one or more processors executing one or more applications. A determination is made whether the one or more processors needs to be reconfigured. A rapid recovery is employed to reconfigure the one or more processors when needed. A computing system that provides reconfigurable and recoverable computing resources is also disclosed. The system comprises one or more processors with a recovery mechanism, with the one or more processors configured to execute a first application, and an additional processor configured to execute a second application different than the first application. The additional processor is reconfigurable with rapid recovery such that the additional processor can execute the first application when one of the one more processors fails.

  18. Boring apparatus capable of boring straight holes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, C.R.

    The invention relates to a rock boring assembly for producing a straight hole for use in a drill string above a pilot boring bit of predetermined diameter smaller than the desired final hole size. The boring assembly comprises a small conical boring bit and a larger conical boring, the conical boring bits mounted on lower and upper ends of an enlongated spacer, respectively, and the major effective cutting diameters of each of the conical boring bits being at least 10% greater than the minor effective cutting diameter of the respective bit. The spacer has a cross-section resistant bending and spacesmore » the conical boring bits apart a distance at least 5 times the major cutting diameter of the small conical boring bit, thereby spacing the pivot points provided by the two conical boring bits to limit bodily angular deflection of the assembly and providing a substantial moment arm to resist lateral forces applied to the assembly by the pilot bit and drill string. The spacing between the conical bits is less than about 20 times the major cutting diameter of the lower conical boring bit to enable the spacer to act as a bend-resistant beam to resist angular deflection of the axis of either of the conical boring bits relative to the other when it receives uneven lateral force due to non-uniformity of cutting conditions about the circumference of the bit. Advantageously the boring bits also are self-advancing and feature skewed rollers. 7 claims.« less

  19. Bit-Grooming: Shave Your Bits with Razor-sharp Precision

    NASA Astrophysics Data System (ADS)

    Zender, C. S.; Silver, J.

    2017-12-01

    Lossless compression can reduce climate data storage by 30-40%. Further reduction requires lossy compression that also reduces precision. Fortunately, geoscientific models and measurements generate false precision (scientifically meaningless data bits) that can be eliminated without sacrificing scientifically meaningful data. We introduce Bit Grooming, a lossy compression algorithm that removes the bloat due to false-precision, those bits and bytes beyond the meaningful precision of the data.Bit Grooming is statistically unbiased, applies to all floating point numbers, and is easy to use. Bit-Grooming reduces geoscience data storage requirements by 40-80%. We compared Bit Grooming to competitors Linear Packing, Layer Packing, and GRIB2/JPEG2000. The other compression methods have the edge in terms of compression, but Bit Grooming is the most accurate and certainly the most usable and portable.Bit Grooming provides flexible and well-balanced solutions to the trade-offs among compression, accuracy, and usability required by lossy compression. Geoscientists could reduce their long term storage costs, and show leadership in the elimination of false precision, by adopting Bit Grooming.

  20. Buffered coscheduling for parallel programming and enhanced fault tolerance

    DOEpatents

    Petrini, Fabrizio [Los Alamos, NM; Feng, Wu-chun [Los Alamos, NM

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  1. A Compression Algorithm for Field Programmable Gate Arrays in the Space Environment

    DTIC Science & Technology

    2011-12-01

    Bit 1 ,Bit 0P  . (V.3) Equation (V.3) is implemented with a string of XOR gates and Bit Basher blocks, as shown in Figure 31. As discussed in...5], the string of Bit Basher blocks are used to separate each 35-bit value into 35 one-bit values, and the string of XOR gates is used to

  2. Drag bit construction

    DOEpatents

    Hood, M.

    1986-02-11

    A mounting movable with respect to an adjacent hard face has a projecting drag bit adapted to engage the hard face. The drag bit is disposed for movement relative to the mounting by encounter of the drag bit with the hard face. That relative movement regulates a valve in a water passageway, preferably extending through the drag bit, to play a stream of water in the area of contact of the drag bit and the hard face and to prevent such water play when the drag bit is out of contact with the hard face. 4 figs.

  3. Drag bit construction

    DOEpatents

    Hood, Michael

    1986-01-01

    A mounting movable with respect to an adjacent hard face has a projecting drag bit adapted to engage the hard face. The drag bit is disposed for movement relative to the mounting by encounter of the drag bit with the hard face. That relative movement regulates a valve in a water passageway, preferably extending through the drag bit, to play a stream of water in the area of contact of the drag bit and the hard face and to prevent such water play when the drag bit is out of contact with the hard face.

  4. Flight design system level C requirements. Solid rocket booster and external tank impact prediction processors. [space transportation system

    NASA Technical Reports Server (NTRS)

    Seale, R. H.

    1979-01-01

    The prediction of the SRB and ET impact areas requires six separate processors. The SRB impact prediction processor computes the impact areas and related trajectory data for each SRB element. Output from this processor is stored on a secure file accessible by the SRB impact plot processor which generates the required plots. Similarly the ET RTLS impact prediction processor and the ET RTLS impact plot processor generates the ET impact footprints for return-to-launch-site (RTLS) profiles. The ET nominal/AOA/ATO impact prediction processor and the ET nominal/AOA/ATO impact plot processor generate the ET impact footprints for non-RTLS profiles. The SRB and ET impact processors compute the size and shape of the impact footprints by tabular lookup in a stored footprint dispersion data base. The location of each footprint is determined by simulating a reference trajectory and computing the reference impact point location. To insure consistency among all flight design system (FDS) users, much input required by these processors will be obtained from the FDS master data base.

  5. PDC-bit performance under simulated borehole conditions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, E.E.; Azar, J.J.

    1993-09-01

    Laboratory drilling tests were used to investigate the effects of pressure on polycrystalline-diamond-compact (PDC) drill-bit performance. Catoosa shale core samples were drilled with PDC and roller-cone bits at up to 1,750-psi confining pressure. All tests were conducted in a controlled environment with a full-scale laboratory drilling system. Test results indicate, that under similar operating conditions, increases in confining pressure reduce PDC-bit performance as much as or more than conventional-rock-bit performance. Specific energy calculations indicate that a combination of rock strength, chip hold-down, and bit balling may have reduced performance. Quantifying the degree to which pressure reduces PDC-bit performance will helpmore » researchers interpret test results and improve bit designs and will help drilling engineers run PDC bits more effectively in the field.« less

  6. Smart built-in test

    NASA Technical Reports Server (NTRS)

    Richards, Dale W.

    1990-01-01

    The work which built-in test (BIT) is asked to perform in today's electronic systems increases with every insertion of new technology or introduction of tighter performance criteria. Yet the basic purpose remains unchanged -- to determine with high confidence the operational capability of that equipment. Achievement of this level of BIT performance requires the management and assimilation of a large amount of data, both realtime and historical. Smart BIT has taken advantage of advanced techniques from the field of artificial intelligence (AI) in order to meet these demands. The Smart BIT approach enhances traditional functional BIT by utilizing AI techniques to incorporate environmental stress data, temporal BIT information and maintenance data, and realtime BIT reports into an integrated test methodology for increased BIT effectiveness and confidence levels. Future research in this area will incorporate onboard fault-logging of BIT output, stress data and Smart BIT decision criteria in support of a singular, integrated and complete test and maintenance capability. The state of this research is described along with a discussion of directions for future development.

  7. Smart built-in test

    NASA Astrophysics Data System (ADS)

    Richards, Dale W.

    1990-03-01

    The work which built-in test (BIT) is asked to perform in today's electronic systems increases with every insertion of new technology or introduction of tighter performance criteria. Yet the basic purpose remains unchanged -- to determine with high confidence the operational capability of that equipment. Achievement of this level of BIT performance requires the management and assimilation of a large amount of data, both realtime and historical. Smart BIT has taken advantage of advanced techniques from the field of artificial intelligence (AI) in order to meet these demands. The Smart BIT approach enhances traditional functional BIT by utilizing AI techniques to incorporate environmental stress data, temporal BIT information and maintenance data, and realtime BIT reports into an integrated test methodology for increased BIT effectiveness and confidence levels. Future research in this area will incorporate onboard fault-logging of BIT output, stress data and Smart BIT decision criteria in support of a singular, integrated and complete test and maintenance capability. The state of this research is described along with a discussion of directions for future development.

  8. Bit-Serial Adder Based on Quantum Dots

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Toomarian, Nikzad; Modarress, Katayoon; Spotnitz, Mathew

    2003-01-01

    A proposed integrated circuit based on quantum-dot cellular automata (QCA) would function as a bit-serial adder. This circuit would serve as a prototype building block for demonstrating the feasibility of quantum-dots computing and for the further development of increasingly complex and increasingly capable quantum-dots computing circuits. QCA-based bit-serial adders would be especially useful in that they would enable the development of highly parallel and systolic processors for implementing fast Fourier, cosine, Hartley, and wavelet transforms. The proposed circuit would complement the QCA-based circuits described in "Implementing Permutation Matrices by Use of Quantum Dots" (NPO-20801), NASA Tech Briefs, Vol. 25, No. 10 (October 2001), page 42 and "Compact Interconnection Networks Based on Quantum Dots" (NPO-20855), which appears elsewhere in this issue. Those articles described the limitations of very-large-scale-integrated (VLSI) circuitry and the major potential advantage afforded by QCA. To recapitulate: In a VLSI circuit, signal paths that are required not to interact with each other must not cross in the same plane. In contrast, for reasons too complex to describe in the limited space available for this article, suitably designed and operated QCA-based signal paths that are required not to interact with each other can nevertheless be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes. To enable a meaningful description of the proposed bit-serial adder, it is necessary to further recapitulate the description of a quantum-dot cellular automation from the first-mentioned prior article: A quantum-dot cellular automaton contains four quantum dots positioned at the corners of a square cell. The cell contains two extra mobile electrons that can tunnel (in the quantum-mechanical sense) between neighboring dots within the cell. The Coulomb repulsion between the two electrons tends to make them occupy antipodal dots in the cell. For an isolated cell, there are two energetically equivalent arrangements (denoted polarization states) of the extra electrons. The cell polarization is used to encode binary information. Because the polarization of a nonisolated cell depends on Coulomb-repulsion interactions with neighboring cells, universal logic gates and binary wires could be constructed, in principle, by arraying QCA of suitable design in suitable patterns. Again, for reasons too complex to describe here, in order to ensure accuracy and timeliness of the output of a QCA array, it is necessary to resort to an adiabatic switching scheme in which the QCA array is divided into subarrays, each controlled by a different phase of a multiphase clock signal. In this scheme, each subarray is given time to perform its computation, then its state is frozen by raising its inter-dot potential barriers and its output is fed as the input to the successor subarray. The successor subarray is kept in an unpolarized state so it does not influence the calculation of preceding subarray. Such a clocking scheme is consistent with pipeline computation in the sense that each different subarray can perform a different part of an overall computation. In other words, QCA arrays are inherently suitable for pipeline and, moreover, systolic computations. This sequential or pipeline aspect of QCA would be utilized in the proposed bit-serial adders.

  9. Hey! A Flea Bit Me!

    MedlinePlus

    ... Staying Safe Videos for Educators Search English Español Hey! A Flea Bit Me! KidsHealth / For Kids / Hey! A Flea Bit Me! Print en español ¡Ay! ¡ ... 30% DEET. More on this topic for: Kids Hey! A Gnat Bit Me! Hey! A Bedbug Bit ...

  10. Hey! A Louse Bit Me!

    MedlinePlus

    ... Staying Safe Videos for Educators Search English Español Hey! A Louse Bit Me! KidsHealth / For Kids / Hey! A Louse Bit Me! Print en español ¡Ay! ¡ ... topic for: Kids Lice Aren't So Nice Hey! A Gnat Bit Me! Hey! A Flea Bit ...

  11. PDC bits: What`s needed to meet tomorrow`s challenge

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Warren, T.M.; Sinor, L.A.

    1994-12-31

    When polycrystalline diamond compact (PDC) bits were introduced in the mid-1970s they showed tantalizingly high penetration rates in laboratory drilling tests. Single cutter tests indicated that they had the potential to drill very hard rocks. Unfortunately, 20 years later we`re still striving to reach the potential that these bits seem to have. Many problems have been overcome, and PDC bits have offered capabilities not possible with roller cone bits. PDC bits provide the most economical bit choice in many areas, but their limited durability has hampered their application in many other areas.

  12. Coding, testing and documentation of processors for the flight design system

    NASA Technical Reports Server (NTRS)

    1980-01-01

    The general functional design and implementation of processors for a space flight design system are briefly described. Discussions of a basetime initialization processor; conic, analytical, and precision coasting flight processors; and an orbit lifetime processor are included. The functions of several utility routines are also discussed.

  13. The computational structural mechanics testbed generic structural-element processor manual

    NASA Technical Reports Server (NTRS)

    Stanley, Gary M.; Nour-Omid, Shahram

    1990-01-01

    The usage and development of structural finite element processors based on the CSM Testbed's Generic Element Processor (GEP) template is documented. By convention, such processors have names of the form ESi, where i is an integer. This manual is therefore intended for both Testbed users who wish to invoke ES processors during the course of a structural analysis, and Testbed developers who wish to construct new element processors (or modify existing ones).

  14. Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)

    1994-01-01

    In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.

  15. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  16. Switch for serial or parallel communication networks

    DOEpatents

    Crosette, D.B.

    1994-07-19

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.

  17. Switch for serial or parallel communication networks

    DOEpatents

    Crosette, Dario B.

    1994-01-01

    A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.

  18. Utilizing the ISS Mission as a Testbed to Develop Cognitive Communications Systems

    NASA Technical Reports Server (NTRS)

    Jackson, Dan

    2016-01-01

    The ISS provides an excellent opportunity for pioneering artificial intelligence software to meet the challenges of real-time communications (comm) link management. This opportunity empowers the ISS Program to forge a testbed for developing cognitive communications systems for the benefit of the ISS mission, manned Low Earth Orbit (LEO) science programs and future planetary exploration programs. In November, 1998, the Flight Operations Directorate (FOD) started the ISS Antenna Manager (IAM) project to develop a single processor supporting multiple comm satellite tracking for two different antenna systems. Further, the processor was developed to be highly adaptable as it supported the ISS mission through all assembly stages. The ISS mission mandated communications specialists with complete knowledge of when the ISS was about to lose or gain comm link service. The current specialty mandated cognizance of large sun-tracking solar arrays and thermal management panels in addition to the highly-dynamic satellite service schedules and rise/set tables. This mission requirement makes the ISS the ideal communications management analogue for future LEO space station and long-duration planetary exploration missions. Future missions, with their precision-pointed, dynamic, laser-based comm links, require complete autonomy for managing high-data rate communications systems. Development of cognitive communications management systems that permit any crew member or payload science specialist, regardless of experience level, to control communications is one of the greater benefits the ISS can offer new space exploration programs. The IAM project met a new mission requirement never previously levied against US space-born communications systems management: process and display the orientation of large solar arrays and thermal control panels based on real-time joint angle telemetry. However, IAM leaves the actual communications availability assessment to human judgment, which introduces unwanted variability because each specialist has a different core of experience with comm link performance. Because the ISS utilizes two different frequency bands, dynamic structure can be occasionally translucent at one frequency while it can completely interdict service at the other frequency. The impact of articulating structure on the comm link can depend on its orientation at the time it impinges on the link. It can become easy for a human specialist to cross-associate experience at one frequency with experience at the other frequency. Additionally, the specialist's experience is incremental, occurring one nine-hour shift at a time. Only the IAM processor experiences the complete 24x7x365 communications link performance for both communications links but, it has no "learning capability." If the IAM processor could be endowed with a cognitive ability to remember past structure-induced comm link outages, based on its knowledge of the ISS position, attitude, communications gear, array joint angles and tracking accuracy, it could convey such experience to the human operator. It could also use its learned communications link behaviors to accurately convey the availability of future communications sessions. Further, the tool could remember how accurately or inaccurately it predicted availability and correct future predictions based on past performance. The IAM tool could learn frequency-specific impacts due to spacecraft structures and pass that information along as "experience." Such development would provide a single artificial intelligence processor that could provide two different experience bases. If it also "knew" the satellite service schedule, it could distinguish structure blockage from schedule or planet blockage and then quickly switch to another satellite. Alternatively, just as a human operator could judge, a cognizant comm system based on the IAM model could "know" that the blockage is not going to last very long and continue tracking a comm satellite, waiting for it to track away from structure. Ultimately, once this capability was fully developed and tested in the Mission Control Center, it could be transferred on-orbit to support development of operations concepts that include more advanced cognitive communications systems. Future applications of this capability are easily foreseen because even more dynamic satellite constellations with more nodes and greater capability are coming. Currently, the ISS fully employs a 300 million bit-per-second (Mbps) return link for harvesting payload science. In the coming eighteen months, it will step up to 600 Mbps. Already there is talk of a 1.2 billion bit-per-second (Gbps) upgrade for the ISS and laser comm links have already been tested from the ISS. Every data rate upgrade mandates more complicated and sensitive communications equipment which implies greater expertise invested in the human operator. Future on-orbit cognizant comm systems will be needed to meet greater performance demands aboard larger, far more complicated spacecraft. In the LEO environment, the old-style one-satellite-per-spacecraft operations concept will give way to a new concept of a single customer spacecraft simultaneously using multiple comm satellites. Much more highly-dynamic manned LEO missions with decades of crew members potentially increase the demand for communications link performance. A cognizant on-board communications system will meet advanced communications demands from future LEO missions and future planetary missions. The ISS has fledgling components of future exploration programs, both LEO and planetary. Further, the Flight Operations Directorate, through the IAM project, has already begun to develop a communications management system that attempts to solve advanced problems ideally represented by dynamic structure impacting scheduled satellite service. With an earnest project to integrate artificial intelligence into the IAM processor, the ISS Program could develop a cognizant communications system that could be adapted and transferred to future on-orbit avionics designs.

  19. Conditions for space invariance in optical data processors used with coherent or noncoherent light.

    PubMed

    Arsenault, H R

    1972-10-01

    The conditions for space invariance in coherent and noncoherent optical processors are considered. All linear optical processors are shown to belong to one of two types. The conditions for space invariance are more stringent for noncoherent processors than for coherent processors, so that a system that is linear in coherent light may be nonlinear in noncoherent light. However, any processor that is linear in noncoherent light is also linear in the coherent limit.

  20. High bit depth infrared image compression via low bit depth codecs

    NASA Astrophysics Data System (ADS)

    Belyaev, Evgeny; Mantel, Claire; Forchhammer, Søren

    2017-08-01

    Future infrared remote sensing systems, such as monitoring of the Earth's environment by satellites, infrastructure inspection by unmanned airborne vehicles etc., will require 16 bit depth infrared images to be compressed and stored or transmitted for further analysis. Such systems are equipped with low power embedded platforms where image or video data is compressed by a hardware block called the video processing unit (VPU). However, in many cases using two 8-bit VPUs can provide advantages compared with using higher bit depth image compression directly. We propose to compress 16 bit depth images via 8 bit depth codecs in the following way. First, an input 16 bit depth image is mapped into 8 bit depth images, e.g., the first image contains only the most significant bytes (MSB image) and the second one contains only the least significant bytes (LSB image). Then each image is compressed by an image or video codec with 8 bits per pixel input format. We analyze how the compression parameters for both MSB and LSB images should be chosen to provide the maximum objective quality for a given compression ratio. Finally, we apply the proposed infrared image compression method utilizing JPEG and H.264/AVC codecs, which are usually available in efficient implementations, and compare their rate-distortion performance with JPEG2000, JPEG-XT and H.265/HEVC codecs supporting direct compression of infrared images in 16 bit depth format. A preliminary result shows that two 8 bit H.264/AVC codecs can achieve similar result as 16 bit HEVC codec.

  1. Performance test of different 3.5 mm drill bits and consequences for orthopaedic surgery.

    PubMed

    Clement, Hans; Zopf, Christoph; Brandner, Markus; Tesch, Norbert P; Vallant, Rudolf; Puchwein, Paul

    2015-12-01

    Drilling of bones in orthopaedic and trauma surgery is a common procedure. There are yet no recommendations about which drill bits/coating should be preferred and when to change a used drill bit. In preliminary studies typical "drilling patterns" of surgeons concerning used spindle speed and feeding force were recorded. Different feeding forces were tested and abrasion was analysed using magnification and a scanning electron microscope (SEM). Acquired data were used for programming a friction stir welding machine (FSWM). Four drill bits (a default AISI 440A, a HSS, an AISI 440B and a Zirconium-oxide drill bit) were analysed for abrasive wear after 20/40/60 machine-guided and hand-driven drilled holes. Additionally different drill coatings [diamond-like carbon/grafitic (DLC), titanium nitride/carbide (Ti-N)] were tested. The mean applied feeding force by surgeons was 45 ± 15.6 Newton (N). HSS bits were still usable after 51 drill holes. Both coated AISI 440A bits showed considerable breakouts of the main cutting edge after 20 hand-driven drilled holes. The coated HSS bit showed very low abrasive wear. The non-coated AISI 440B bit had a similar durability to the HSS bits. The ZrO2 dental drill bit excelled its competitors (no considerable abrasive wear at >100 holes). If the default AISI 440A drill bit cannot be checked by 20-30× magnification after surgery, it should be replaced after 20 hand-driven drilled holes. Low price coated HSS bits could be a powerful alternative.

  2. Broadcasting collective operation contributions throughout a parallel computer

    DOEpatents

    Faraj, Ahmad [Rochester, MN

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  3. Bit-1 Mediates Integrin-dependent Cell Survival through Activation of the NFκB Pathway*

    PubMed Central

    Griffiths, Genevieve S.; Grundl, Melanie; Leychenko, Anna; Reiter, Silke; Young-Robbins, Shirley S.; Sulzmaier, Florian J.; Caliva, Maisel J.; Ramos, Joe W.; Matter, Michelle L.

    2011-01-01

    Loss of properly regulated cell death and cell survival pathways can contribute to the development of cancer and cancer metastasis. Cell survival signals are modulated by many different receptors, including integrins. Bit-1 is an effector of anoikis (cell death due to loss of attachment) in suspended cells. The anoikis function of Bit-1 can be counteracted by integrin-mediated cell attachment. Here, we explored integrin regulation of Bit-1 in adherent cells. We show that knockdown of endogenous Bit-1 in adherent cells decreased cell survival and re-expression of Bit-1 abrogated this effect. Furthermore, reduction of Bit-1 promoted both staurosporine and serum-deprivation induced apoptosis. Indeed knockdown of Bit-1 in these cells led to increased apoptosis as determined by caspase-3 activation and positive TUNEL staining. Bit-1 expression protected cells from apoptosis by increasing phospho-IκB levels and subsequently bcl-2 gene transcription. Protection from apoptosis under serum-free conditions correlated with bcl-2 transcription and Bcl-2 protein expression. Finally, Bit-1-mediated regulation of bcl-2 was dependent on focal adhesion kinase, PI3K, and AKT. Thus, we have elucidated an integrin-controlled pathway in which Bit-1 is, in part, responsible for the survival effects of cell-ECM interactions. PMID:21383007

  4. Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)

    NASA Astrophysics Data System (ADS)

    Zender, Charles S.

    2016-09-01

    Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and be scientifically pointless, especially for measurements. By contrast, lossy compression can be both economical (save space) and heuristic (clarify data limitations) without compromising the scientific integrity of data. Data quantization can thus be appropriate regardless of whether space limitations are a concern. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE floating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least significant bits of consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving that quantizes values solely by zeroing bits. Our variation eliminates the artificial low bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional fields whose mean statistics are important. Bit Grooming relies on standard lossless compression to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the DEFLATE compression algorithm to bit-groomed and full-precision climate data stored in netCDF3, netCDF4, HDF4, and HDF5 formats. Bit Grooming reduces the storage space required by initially uncompressed and compressed climate data by 25-80 and 5-65 %, respectively, for single-precision values (the most common case for climate data) quantized to retain 1-5 decimal digits of precision. The potential reduction is greater for double-precision datasets. When used aggressively (i.e., preserving only 1-2 digits), Bit Grooming produces storage reductions comparable to other quantization techniques such as Linear Packing. Unlike Linear Packing, whose guaranteed precision rapidly degrades within the relatively narrow dynamic range of values that it can compress, Bit Grooming guarantees the specified precision throughout the full floating-point range. Data quantization by Bit Grooming is irreversible (i.e., lossy) yet transparent, meaning that no extra processing is required by data users/readers. Hence Bit Grooming can easily reduce data storage volume without sacrificing scientific precision or imposing extra burdens on users.

  5. LANDSAT-D flight segment operations manual. Appendix B: OBC software operations

    NASA Technical Reports Server (NTRS)

    Talipsky, R.

    1981-01-01

    The LANDSAT 4 satellite contains two NASA standard spacecraft computers and 65,536 words of memory. Onboard computer software is divided into flight executive and applications processors. Both applications processors and the flight executive use one or more of 67 system tables to obtain variables, constants, and software flags. Output from the software for monitoring operation is via 49 OBC telemetry reports subcommutated in the spacecraft telemetry. Information is provided about the flight software as it is used to control the various spacecraft operations and interpret operational OBC telemetry. Processor function descriptions, processor operation, software constraints, processor system tables, processor telemetry, and processor flow charts are presented.

  6. Causes of wear of PDC bits and ways of improving their wear resistance

    NASA Astrophysics Data System (ADS)

    Timonin, VV; Smolentsev, AS; Shakhtorin, I. O.; Polushin, NI; Laptev, AI; Kushkhabiev, AS

    2017-02-01

    The scope of the paper encompasses basic factors that influence PDC bit efficiency. Feasible ways of eliminating the negatives are illustrated. The wash fluid flow in a standard bit is modeled, the resultant pattern of the bit washing is analyzed, and the recommendations are made on modification of the PDC bit design.

  7. Managing Power Heterogeneity

    NASA Astrophysics Data System (ADS)

    Pruhs, Kirk

    A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.

  8. A DSP-based neural network non-uniformity correction algorithm for IRFPA

    NASA Astrophysics Data System (ADS)

    Liu, Chong-liang; Jin, Wei-qi; Cao, Yang; Liu, Xiu

    2009-07-01

    An effective neural network non-uniformity correction (NUC) algorithm based on DSP is proposed in this paper. The non-uniform response in infrared focal plane array (IRFPA) detectors produces corrupted images with a fixed-pattern noise(FPN).We introduced and analyzed the artificial neural network scene-based non-uniformity correction (SBNUC) algorithm. A design of DSP-based NUC development platform for IRFPA is described. The DSP hardware platform designed is of low power consumption, with 32-bit fixed point DSP TMS320DM643 as the kernel processor. The dependability and expansibility of the software have been improved by DSP/BIOS real-time operating system and Reference Framework 5. In order to realize real-time performance, the calibration parameters update is set at a lower task priority then video input and output in DSP/BIOS. In this way, calibration parameters updating will not affect video streams. The work flow of the system and the strategy of real-time realization are introduced. Experiments on real infrared imaging sequences demonstrate that this algorithm requires only a few frames to obtain high quality corrections. It is computationally efficient and suitable for all kinds of non-uniformity.

  9. Faithful conversion of propagating quantum information to mechanical motion

    NASA Astrophysics Data System (ADS)

    Reed, A. P.; Mayer, K. H.; Teufel, J. D.; Burkhart, L. D.; Pfaff, W.; Reagor, M.; Sletten, L.; Ma, X.; Schoelkopf, R. J.; Knill, E.; Lehnert, K. W.

    2017-12-01

    The motion of micrometre-sized mechanical resonators can now be controlled and measured at the fundamental limits imposed by quantum mechanics. These resonators have been prepared in their motional ground state or in squeezed states, measured with quantum-limited precision, and even entangled with microwave fields. Such advances make it possible to process quantum information using the motion of a macroscopic object. In particular, recent experiments have combined mechanical resonators with superconducting quantum circuits to frequency-convert, store and amplify propagating microwave fields. But these systems have not been used to manipulate states that encode quantum bits (qubits), which are required for quantum communication and modular quantum computation. Here we demonstrate the conversion of propagating qubits encoded as superpositions of zero and one photons to the motion of a micromechanical resonator with a fidelity in excess of the classical bound. This ability is necessary for mechanical resonators to convert quantum information between the microwave and optical domains or to act as storage elements in a modular quantum information processor. Additionally, these results are an important step towards testing speculative notions that quantum theory may not be valid for sufficiently massive systems.

  10. Long-range interactions and parallel scalability in molecular simulations

    NASA Astrophysics Data System (ADS)

    Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

    2007-01-01

    Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

  11. Control and Measurement of an Xmon with the Quantum Socket

    NASA Astrophysics Data System (ADS)

    McConkey, T. G.; Bejanin, J. H.; Earnest, C. T.; McRae, C. R. H.; Rinehart, J. R.; Weides, M.; Mariantoni, M.

    The implementation of superconducting quantum processors is rapidly reaching scalability limitations. Extensible electronics and wiring solutions for superconducting quantum bits (qubits) are among the most imminent issues to be tackled. The necessity to substitute planar electrical interconnects (e.g., wire bonds) with three-dimensional wires is emerging as a fundamental pillar towards scalability. In a previous work, we have shown that three-dimensional wires housed in a suitable package, named the quantum socket, can be utilized to measure high-quality superconducting resonators. In this work, we set out to test the quantum socket with actual superconducting qubits to verify its suitability as a wiring solution in the development of an extensible quantum computing architecture. To this end, we have designed and fabricated a series of Xmon qubits. The qubits range in frequency from about 6 to 7 GHz with anharmonicity of 200 MHz and can be tuned by means of Z pulses. Controlling tunable Xmons will allow us to verify whether the three-dimensional wires contact resistance is low enough for qubit operation. Qubit T1 and T2 times and single qubit gate fidelities are compared against current standards in the field.

  12. Data acquisition for the new muon g-2 experiment at Fermilab

    DOE PAGES

    Gohn, Wesley

    2015-12-23

    A new measurement of the anomalous magnetic moment of the muon, a μ ≡ (g - 2)/2, will be performed at the Fermi National Accelerator Laboratory. The most recent measurement, performed at Brookhaven National Laboratory and completed in 2001, shows a 3.3-3.6 standard deviation discrepancy with the Standard Model predictions for a μ. The new measurement will accumulate 21 times those statistics, measuring a μ to 140 ppb and reducing the uncertainty by a factor of 4. The data acquisition system for this experiment must have the ability to record deadtime-free records from 700 μs muon spills at a rawmore » data rate of 18 GB per second. Data will be collected using 1296 channels of μTCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording and processing of detector signals during the spill. The system will be controlled using the MIDAS data acquisition software package. Lastly, the described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.« less

  13. Quantum memory operations in a flux qubit - spin ensemble hybrid system

    NASA Astrophysics Data System (ADS)

    Saito, S.; Zhu, X.; Amsuss, R.; Matsuzaki, Y.; Kakuyanagi, K.; Shimo-Oka, T.; Mizuochi, N.; Nemoto, K.; Munro, W. J.; Semba, K.

    2014-03-01

    Superconducting quantum bits (qubits) are one of the most promising candidates for a future large-scale quantum processor. However for larger scale realizations the currently reported coherence times of these macroscopic objects (superconducting qubits) has not yet reached those of microscopic systems (electron spins, nuclear spins, etc). In this context, a superconductor-spin ensemble hybrid system has attracted considerable attention. The spin ensemble could operate as a quantum memory for superconducting qubits. We have experimentally demonstrated quantum memory operations in a superconductor-diamond hybrid system. An excited state and a superposition state prepared in the flux qubit can be transferred to, stored in and retrieved from the NV spin ensemble in diamond. From these experiments, we have found the coherence time of the spin ensemble is limited by the inhomogeneous broadening of the electron spin (4.4 MHz) and by the hyperfine coupling to nitrogen nuclear spins (2.3 MHz). In the future, spin echo techniques could eliminate these effects and elongate the coherence time. Our results are a significant first step in utilizing the spin ensemble as long-lived quantum memory for superconducting flux qubits. This work was supported by the FIRST program and NICT.

  14. Digital Plasma Control System for Alcator C-Mod

    NASA Astrophysics Data System (ADS)

    Ferrara, M.; Wolfe, S.; Stillerman, J.; Fredian, T.; Hutchinson, I.

    2004-11-01

    A digital plasma control system (DPCS) has been designed to replace the present C-Mod system, which is based on hybrid analog-digital computer. The initial implementation of DPCS comprises two 64 channel, 16 bit, low-latency cPCI digitizers, each with 16 analog outputs, controlled by a rack-mounted single-processor Linux server, which also serves as the compute engine. A prototype system employing three older 32 channel digitizers was tested during the 2003-04 campaign. The hybrid's linear PID feedback system was emulated by IDL code executing a synchronous loop, using the same target waveforms and control parameters. Reliable real-time operation was accomplished under a standard Linux OS (RH9) by locking memory and disabling interrupts during the plasma pulse. The DPCS-computed outputs agreed to within a few percent with those produced by the hybrid system, except for discrepancies due to offsets and non-ideal behavior of the hybrid circuitry. The system operated reliably, with no sample loss, at more than twice the 10kHz design specification, providing extra time for implementing more advanced control algorithms. The code is fault-tolerant and produces consistent output waveforms even with 10% sample loss.

  15. Data Acquisition with GPUs: The DAQ for the Muon $g$-$2$ Experiment at Fermilab

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gohn, W.

    Graphical Processing Units (GPUs) have recently become a valuable computing tool for the acquisition of data at high rates and for a relatively low cost. The devices work by parallelizing the code into thousands of threads, each executing a simple process, such as identifying pulses from a waveform digitizer. The CUDA programming library can be used to effectively write code to parallelize such tasks on Nvidia GPUs, providing a significant upgrade in performance over CPU based acquisition systems. The muonmore » $g$-$2$ experiment at Fermilab is heavily relying on GPUs to process its data. The data acquisition system for this experiment must have the ability to create deadtime-free records from 700 $$\\mu$$s muon spills at a raw data rate 18 GB per second. Data will be collected using 1296 channels of $$\\mu$$TCA-based 800 MSPS, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording of the muon decays during the spill. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.« less

  16. Method for Veterbi decoding of large constraint length convolutional codes

    NASA Technical Reports Server (NTRS)

    Hsu, In-Shek (Inventor); Truong, Trieu-Kie (Inventor); Reed, Irving S. (Inventor); Jing, Sun (Inventor)

    1988-01-01

    A new method of Viterbi decoding of convolutional codes lends itself to a pipline VLSI architecture using a single sequential processor to compute the path metrics in the Viterbi trellis. An array method is used to store the path information for NK intervals where N is a number, and K is constraint length. The selected path at the end of each NK interval is then selected from the last entry in the array. A trace-back method is used for returning to the beginning of the selected path back, i.e., to the first time unit of the interval NK to read out the stored branch metrics of the selected path which correspond to the message bits. The decoding decision made in this way is no longer maximum likelihood, but can be almost as good, provided that constraint length K in not too small. The advantage is that for a long message, it is not necessary to provide a large memory to store the trellis derived information until the end of the message to select the path that is to be decoded; the selection is made at the end of every NK time unit, thus decoding a long message in successive blocks.

  17. Integrating Fingerprint Verification into the Smart Card-Based Healthcare Information System

    NASA Astrophysics Data System (ADS)

    Moon, Daesung; Chung, Yongwha; Pan, Sung Bum; Park, Jin-Won

    2009-12-01

    As VLSI technology has been improved, a smart card employing 32-bit processors has been released, and more personal information such as medical, financial data can be stored in the card. Thus, it becomes important to protect personal information stored in the card. Verification of the card holder's identity using a fingerprint has advantages over the present practices of Personal Identification Numbers (PINs) and passwords. However, the computational workload of fingerprint verification is much heavier than that of the typical PIN-based solution. In this paper, we consider three strategies to implement fingerprint verification in a smart card environment and how to distribute the modules of fingerprint verification between the smart card and the card reader. We first evaluate the number of instructions of each step of a typical fingerprint verification algorithm, and estimate the execution time of several cryptographic algorithms to guarantee the security/privacy of the fingerprint data transmitted in the smart card with the client-server environment. Based on the evaluation results, we analyze each scenario with respect to the security level and the real-time execution requirements in order to implement fingerprint verification in the smart card with the client-server environment.

  18. Forward-biased nanophotonic detector for ultralow-energy dissipation receiver

    NASA Astrophysics Data System (ADS)

    Nozaki, Kengo; Matsuo, Shinji; Fujii, Takuro; Takeda, Koji; Shinya, Akihiko; Kuramochi, Eiichi; Notomi, Masaya

    2018-04-01

    Generally, reverse-biased photodetectors (PDs) are used for high-speed optical receivers. The forward voltage region is only utilized in solar-cells, and this photovoltaic operation would not be concurrently obtained with high efficiency and high speed operation. Here we report that photonic-crystal waveguide PDs enable forward-biased high-speed operation at 40 Gbit/s with keeping high responsivity (0.88 A/W). Within our knowledge, this is the first demonstration of the forward-biased PDs with high responsivity. This achievement is attributed to the ultracompactness of our PD and the strong light confinement within the absorber and depleted regions, thereby enabling efficient photo-carrier generation and fast extraction. This result indicates that it is possible to construct a high-speed and ultracompact photo-receiver without an electrical amplifier nor an external bias circuit. Since there is no electrical energy required, our estimation shows that the consumption energy is just the optical energy of the injected signal pulse which is about 1 fJ/bit. Hence, it will lead to an ultimately efficient and highly integrable optical-to-electrical converter in a chip, which will be a key ingredient for dense nanophotonic communication and processors.

  19. Comparison of Monte Carlo simulated and measured performance parameters of miniPET scanner

    NASA Astrophysics Data System (ADS)

    Kis, S. A.; Emri, M.; Opposits, G.; Bükki, T.; Valastyán, I.; Hegyesi, Gy.; Imrek, J.; Kalinka, G.; Molnár, J.; Novák, D.; Végh, J.; Kerek, A.; Trón, L.; Balkay, L.

    2007-02-01

    In vivo imaging of small laboratory animals is a valuable tool in the development of new drugs. For this purpose, miniPET, an easy to scale modular small animal PET camera has been developed at our institutes. The system has four modules, which makes it possible to rotate the whole detector system around the axis of the field of view. Data collection and image reconstruction are performed using a data acquisition (DAQ) module with Ethernet communication facility and a computer cluster of commercial PCs. Performance tests were carried out to determine system parameters, such as energy resolution, sensitivity and noise equivalent count rate. A modified GEANT4-based GATE Monte Carlo software package was used to simulate PET data analogous to those of the performance measurements. GATE was run on a Linux cluster of 10 processors (64 bit, Xeon with 3.0 GHz) and controlled by a SUN grid engine. The application of this special computer cluster reduced the time necessary for the simulations by an order of magnitude. The simulated energy spectra, maximum rate of true coincidences and sensitivity of the camera were in good agreement with the measured parameters.

  20. Data Acquisition for the New Muon g-2 Experiment at Fermilab

    NASA Astrophysics Data System (ADS)

    Gohn, Wesley

    2015-12-01

    A new measurement of the anomalous magnetic moment of the muon,aμ≡ (g - 2)/2, will be performed at the Fermi National Accelerator Laboratory. The most recent measurement, performed at Brookhaven National Laboratory and completed in 2001, shows a 3.3-3.6 standard deviation discrepancy with the Standard Model predictions for aμ. The new measurement will accumulate 21 times those statistics, measuring aμ to 140 ppb and reducing the uncertainty by a factor of 4. The data acquisition system for this experiment must have the ability to record deadtime-free records from 700 μs muon spills at a raw data rate of 18 GB per second. Data will be collected using 1296 channels of μTCA-based 800 MHz, 12 bit waveform digitizers and processed in a layered array of networked commodity processors with 24 GPUs working in parallel to perform a fast recording and processing of detector signals during the spill. The system will be controlled using the MIDAS data acquisition software package. The described data acquisition system is currently being constructed, and will be fully operational before the start of the experiment in 2017.

  1. Repetitive readout of a single electronic spin via quantum logic with nuclear spin ancillae.

    PubMed

    Jiang, L; Hodges, J S; Maze, J R; Maurer, P; Taylor, J M; Cory, D G; Hemmer, P R; Walsworth, R L; Yacoby, A; Zibrov, A S; Lukin, M D

    2009-10-09

    Robust measurement of single quantum bits plays a key role in the realization of quantum computation and communication as well as in quantum metrology and sensing. We have implemented a method for the improved readout of single electronic spin qubits in solid-state systems. The method makes use of quantum logic operations on a system consisting of a single electronic spin and several proximal nuclear spin ancillae in order to repetitively readout the state of the electronic spin. Using coherent manipulation of a single nitrogen vacancy center in room-temperature diamond, full quantum control of an electronic-nuclear system consisting of up to three spins was achieved. We took advantage of a single nuclear-spin memory in order to obtain a 10-fold enhancement in the signal amplitude of the electronic spin readout. We also present a two-level, concatenated procedure to improve the readout by use of a pair of nuclear spin ancillae, an important step toward the realization of robust quantum information processors using electronic- and nuclear-spin qubits. Our technique can be used to improve the sensitivity and speed of spin-based nanoscale diamond magnetometers.

  2. Model of biological quantum logic in DNA.

    PubMed

    Mihelic, F Matthew

    2013-08-02

    The DNA molecule has properties that allow it to act as a quantum logic processor. It has been demonstrated that there is coherent conduction of electrons longitudinally along the DNA molecule through pi stacking interactions of the aromatic nucleotide bases, and it has also been demonstrated that electrons moving longitudinally along the DNA molecule are subject to a very efficient electron spin filtering effect as the helicity of the DNA molecule interacts with the spin of the electron. This means that, in DNA, electrons are coherently conducted along a very efficient spin filter. Coherent electron spin is held in a logically and thermodynamically reversible chiral symmetry between the C2-endo and C3-endo enantiomers of the deoxyribose moiety in each nucleotide, which enables each nucleotide to function as a quantum gate. The symmetry break that provides for quantum decision in the system is determined by the spin direction of an electron that has an orbital angular momentum that is sufficient to overcome the energy barrier of the double well potential separating the C2-endo and C3-endo enantiomers, and that enantiomeric energy barrier is appropriate to the Landauer limit of the energy necessary to randomize one bit of information.

  3. Multi-Core Processor Memory Contention Benchmark Analysis Case Study

    NASA Technical Reports Server (NTRS)

    Simon, Tyler; McGalliard, James

    2009-01-01

    Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.

  4. Spatial domain entertainment audio decompression/compression

    NASA Astrophysics Data System (ADS)

    Chan, Y. K.; Tam, Ka Him K.

    2014-02-01

    The ARM7 NEON processor with 128bit SIMD hardware accelerator requires a peak performance of 13.99 Mega Cycles per Second for MP3 stereo entertainment quality decoding. For similar compression bit rate, OGG and AAC is preferred over MP3. The Patent Cooperation Treaty Application dated 28/August/2012 describes an audio decompression scheme producing a sequence of interleaving "min to Max" and "Max to min" rising and falling segments. The number of interior audio samples bound by "min to Max" or "Max to min" can be {0|1|…|N} audio samples. The magnitudes of samples, including the bounding min and Max, are distributed as normalized constants within the 0 and 1 of the bounding magnitudes. The decompressed audio is then a "sequence of static segments" on a frame by frame basis. Some of these frames needed to be post processed to elevate high frequency. The post processing is compression efficiency neutral and the additional decoding complexity is only a small fraction of the overall decoding complexity without the need of extra hardware. Compression efficiency can be speculated as very high as source audio had been decimated and converted to a set of data with only "segment length and corresponding segment magnitude" attributes. The PCT describes how these two attributes are efficiently coded by the PCT innovative coding scheme. The PCT decoding efficiency is obviously very high and decoding latency is basically zero. Both hardware requirement and run time is at least an order of magnitude better than MP3 variants. The side benefit is ultra low power consumption on mobile device. The acid test on how such a simplistic waveform representation can indeed reproduce authentic decompressed quality is benchmarked versus OGG(aoTuv Beta 6.03) by three pair of stereo audio frames and one broadcast like voice audio frame with each frame consisting 2,028 samples at 44,100KHz sampling frequency.

  5. Multithreaded transactions in scientific computing: New versions of a computer program for kinematical calculations of RHEED intensity oscillations

    NASA Astrophysics Data System (ADS)

    Brzuszek, Marcin; Daniluk, Andrzej

    2006-11-01

    Writing a concurrent program can be more difficult than writing a sequential program. Programmer needs to think about synchronisation, race conditions and shared variables. Transactions help reduce the inconvenience of using threads. A transaction is an abstraction, which allows programmers to group a sequence of actions on the program into a logical, higher-level computation unit. This paper presents multithreaded versions of the GROWTH program, which allow to calculate the layer coverages during the growth of thin epitaxial films and the corresponding RHEED intensities according to the kinematical approximation. The presented programs also contain graphical user interfaces, which enable displaying program data at run-time. New version program summaryTitles of programs:GROWTHGr, GROWTH06 Catalogue identifier:ADVL_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v2_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Catalogue identifier of previous version:ADVL Does the new version supersede the original program:No Computer for which the new version is designed and others on which it has been tested: Pentium-based PC Operating systems or monitors under which the new version has been tested: Windows 9x, XP, NT Programming language used:Object Pascal Memory required to execute with typical data:More than 1 MB Number of bits in a word:64 bits Number of processors used:1 No. of lines in distributed program, including test data, etc.:20 931 Number of bytes in distributed program, including test data, etc.: 1 311 268 Distribution format:tar.gz Nature of physical problem: The programs compute the RHEED intensities during the growth of thin epitaxial structures prepared using the molecular beam epitaxy (MBE). The computations are based on the use of kinematical diffraction theory [P.I. Cohen, G.S. Petrich, P.R. Pukite, G.J. Whaley, A.S. Arrott, Surf. Sci. 216 (1989) 222. [1

  6. A customizable commercial miniaturized 320×256 indium gallium arsenide shortwave infrared camera

    NASA Astrophysics Data System (ADS)

    Huang, Shih-Che; O'Grady, Matthew; Groppe, Joseph V.; Ettenberg, Martin H.; Brubaker, Robert M.

    2004-10-01

    The design and performance of a commercial short-wave-infrared (SWIR) InGaAs microcamera engine is presented. The 0.9-to-1.7 micron SWIR imaging system consists of a room-temperature-TEC-stabilized, 320x256 (25 μm pitch) InGaAs focal plane array (FPA) and a high-performance, highly customizable image-processing set of electronics. The detectivity, D*, of the system is greater than 1013 cm-√Hz/W at 1.55 μm, and this sensitivity may be adjusted in real-time over 100 dB. It features snapshot-mode integration with a minimum exposure time of 130 μs. The digital video processor provides real time pixel-to-pixel, 2-point dark-current subtraction and non-uniformity compensation along with defective-pixel substitution. Other features include automatic gain control (AGC), gamma correction, 7 preset configurations, adjustable exposure time, external triggering, and windowing. The windowing feature is highly flexible; the region of interest (ROI) may be placed anywhere on the imager and can be varied at will. Windowing allows for high-speed readout enabling such applications as target acquisition and tracking; for example, a 32x32 ROI window may be read out at over 3500 frames per second (fps). Output video is provided as EIA170-compatible analog, or as 12-bit CameraLink-compatible digital. All the above features are accomplished in a small volume < 28 cm3, weight < 70 g, and with low power consumption < 1.3 W at room temperature using this new microcamera engine. Video processing is based on a field-programmable gate array (FPGA) platform with a soft-embedded processor that allows for ease of integration/addition of customer-specific algorithms, processes, or design requirements. The camera was developed with the high-performance, space-restricted, power-conscious application in mind, such as robotic or UAV deployment.

  7. Simulink/PARS Integration Support

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vacaliuc, B.; Nakhaee, N.

    2013-12-18

    The state of the art for signal processor hardware has far out-paced the development tools for placing applications on that hardware. In addition, signal processors are available in a variety of architectures, each uniquely capable of handling specific types of signal processing efficiently. With these processors becoming smaller and demanding less power, it has become possible to group multiple processors, a heterogeneous set of processors, into single systems. Different portions of the desired problem set can be assigned to different processor types as appropriate. As software development tools do not keep pace with these processors, especially when multiple processors ofmore » different types are used, a method is needed to enable software code portability among multiple processors and multiple types of processors along with their respective software environments. Sundance DSP, Inc. has developed a software toolkit called “PARS”, whose objective is to provide a framework that uses suites of tools provided by different vendors, along with modeling tools and a real time operating system, to build an application that spans different processor types. The software language used to express the behavior of the system is a very high level modeling language, “Simulink”, a MathWorks product. ORNL has used this toolkit to effectively implement several deliverables. This CRADA describes this collaboration between ORNL and Sundance DSP, Inc.« less

  8. Image processing on the image with pixel noise bits removed

    NASA Astrophysics Data System (ADS)

    Chuang, Keh-Shih; Wu, Christine

    1992-06-01

    Our previous studies used statistical methods to assess the noise level in digital images of various radiological modalities. We separated the pixel data into signal bits and noise bits and demonstrated visually that the removal of the noise bits does not affect the image quality. In this paper we apply image enhancement techniques on noise-bits-removed images and demonstrate that the removal of noise bits has no effect on the image property. The image processing techniques used are gray-level look up table transformation, Sobel edge detector, and 3-D surface display. Preliminary results show no noticeable difference between original image and noise bits removed image using look up table operation and Sobel edge enhancement. There is a slight enhancement of the slicing artifact in the 3-D surface display of the noise bits removed image.

  9. Drilling and Caching Architecture for the Mars2020 Mission

    NASA Astrophysics Data System (ADS)

    Zacny, K.

    2013-12-01

    We present a Sample Acquisition and Caching (SAC) architecture for the Mars2020 mission and detail how the architecture meets the sampling requirements described in the Mars2020 Science Definition Team (SDT) report. The architecture uses 'One Bit per Core' approach. Having dedicated bit for each rock core allows a reduction in the number of core transfer steps and actuators and this reduces overall mission risk. It also alleviates the bit life problem, eliminates cross contamination, and aids in hermetic sealing. An added advantage is faster drilling time, lower power, lower energy, and lower Weight on Bit (which reduces Arm preload requirements). To enable replacing of core samples, the drill bits are based on the BigTooth bit design. The BigTooth bit cuts a core diameter slightly smaller than the imaginary hole inscribed by the inner surfaces of the bits. Hence the rock core could be much easier ejected along the gravity vector. The architecture also has three additional types of bits that allow analysis of rocks. Rock Abrasion and Brushing Bit (RABBit) allows brushing and grinding of rocks in the same was as Rock Abrasion Tool does on MER. PreView bit allows viewing and analysis of rock core surfaces. Powder and Regolith Acquisition Bit (PRABit) captures regolith and rock powder either for in situ analysis or sample return. PRABit also allows sieving capabilities. The architecture can be viewed here: http://www.youtube.com/watch?v=_-hOO4-zDtE

  10. SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processors with scanning CCD photodetectors

    NASA Astrophysics Data System (ADS)

    Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.

    1995-10-01

    Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.

  11. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  12. Shared performance monitor in a multiprocessor system

    DOEpatents

    Chiu, George; Gara, Alan G.; Salapura, Valentina

    2012-07-24

    A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.

  13. Brownian motion properties of optoelectronic random bit generators based on laser chaos.

    PubMed

    Li, Pu; Yi, Xiaogang; Liu, Xianglian; Wang, Yuncai; Wang, Yongge

    2016-07-11

    The nondeterministic property of the optoelectronic random bit generator (RBG) based on laser chaos are experimentally analyzed from two aspects of the central limit theorem and law of iterated logarithm. The random bits are extracted from an optical feedback chaotic laser diode using a multi-bit extraction technique in the electrical domain. Our experimental results demonstrate that the generated random bits have no statistical distance from the Brownian motion, besides that they can pass the state-of-the-art industry-benchmark statistical test suite (NIST SP800-22). All of them give a mathematically provable evidence that the ultrafast random bit generator based on laser chaos can be used as a nondeterministic random bit source.

  14. Drill bit assembly for releasably retaining a drill bit cutter

    DOEpatents

    Glowka, David A.; Raymond, David W.

    2002-01-01

    A drill bit assembly is provided for releasably retaining a polycrystalline diamond compact drill bit cutter. Two adjacent cavities formed in a drill bit body house, respectively, the disc-shaped drill bit cutter and a wedge-shaped cutter lock element with a removable fastener. The cutter lock element engages one flat surface of the cutter to retain the cutter in its cavity. The drill bit assembly thus enables the cutter to be locked against axial and/or rotational movement while still providing for easy removal of a worn or damaged cutter. The ability to adjust and replace cutters in the field reduces the effect of wear, helps maintains performance and improves drilling efficiency.

  15. Using Optimal Dependency-Trees for Combinatorial Optimization: Learning the Structure of the Search Space.

    DTIC Science & Technology

    1997-01-01

    create a dependency tree containing an optimum set of n-1 first-order dependencies. To do this, first, we select an arbitrary bit Xroot to place at the...the root to an arbitrary bit Xroot -For all other bits Xi, set bestMatchingBitInTree[Xi] to Xroot . -While not all bits have been

  16. Antiwhirl PDC bits increased penetration rates in Alberta drilling. [Polycrystalline Diamond Compact

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bobrosky, D.; Osmak, G.

    1993-07-05

    The antiwhirl PDC bits and an inhibitive mud system contributed to the quicker drilling of the time-sensitive shales. The hole washouts in the intermediate section were dramatically reduced, resulting in better intermediate casing cement jobs. Also, the use of antirotation PDC-drillable cementing plugs eliminated the need to drill out plugs and float equipment with a steel tooth bit and then trip for the PDC bit. By using an antiwhirl PDC bit, at least one trip was eliminated in the intermediate section. Offset data indicated that two to six conventional bits would have been required to drill the intermediate hole interval.more » The PDC bit was rebuildable and therefore rerunnable even after being used on five wells. In each instance, the cost of replacing chipped cutters was less than the cost of a new insert roller cone bit. The paper describes the antiwhirl bits; the development of the bits; and their application in a clastic sequence, a carbonate sequence, and the Shekilie oil field; the improvement in the rate of penetration; the selection of bottom hole assemblies; washout problems; and drill-out characteristics.« less

  17. Evaluations of bit sleeve and twisted-body bit designs for controlling roof bolter dust

    PubMed Central

    Beck, T.W.

    2015-01-01

    Drilling into coal mine roof strata to install roof bolts has the potential to release substantial quantities of respirable dust. Due to the proximity of drill holes to the breathing zone of roof bolting personnel, dust escaping the holes and avoiding capture by the dust collection system pose a potential respiratory health risk. Controls are available to complement the typical dry vacuum collection system and minimize harmful exposures during the initial phase of drilling. This paper examines the use of a bit sleeve in combination with a dust-hog-type bit to improve dust extraction during the critical initial phase of drilling. A twisted-body drill bit is also evaluated to determine the quantity of dust liberated in comparison with the dust-hog-type bit. Based on the results of our laboratory tests, the bit sleeve may reduce dust emissions by one-half during the initial phase of drilling before the drill bit is fully enclosed by the drill hole. Because collaring is responsible for the largest dust liberations, overall dust emission can also be substantially reduced. The use of a twisted-body bit has minimal improvement on dust capture compared with the commonly used dust-hog-type bit. PMID:26257435

  18. Implementation of kernels on the Maestro processor

    NASA Astrophysics Data System (ADS)

    Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.

    Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.

  19. Ordering of guarded and unguarded stores for no-sync I/O

    DOEpatents

    Gara, Alan; Ohmacht, Martin

    2013-06-25

    A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.

  20. Sample Acquisition and Caching architecture for the Mars Sample Return mission

    NASA Astrophysics Data System (ADS)

    Zacny, K.; Chu, P.; Cohen, J.; Paulsen, G.; Craft, J.; Szwarc, T.

    This paper presents a Mars Sample Return (MSR) Sample Acquisition and Caching (SAC) study developed for the three rover platforms: MER, MER+, and MSL. The study took into account 26 SAC requirements provided by the NASA Mars Exploration Program Office. For this SAC architecture, the reduction of mission risk was chosen by us as having greater priority than mass or volume. For this reason, we selected a “ One Bit per Core” approach. The enabling technology for this architecture is Honeybee Robotics' “ eccentric tubes” core breakoff approach. The breakoff approach allows the drill bits to be relatively small in diameter and in turn lightweight. Hence, the bits could be returned to Earth with the cores inside them with only a modest increase to the total returned mass, but a significant decrease in complexity. Having dedicated bits allows a reduction in the number of core transfer steps and actuators. It also alleviates the bit life problem, eliminates cross contamination, and aids in hermetic sealing. An added advantage is faster drilling time, lower power, lower energy, and lower Weight on Bit (which reduces Arm preload requirements). Drill bits are based on the BigTooth bit concept, which allows re-use of the same bit multiple times, if necessary. The proposed SAC consists of a 1) Rotary-Percussive Core Drill, 2) Bit Storage Carousel, 3) Cache, 4) Robotic Arm, and 5) Rock Abrasion and Brushing Bit (RABBit), which is deployed using the Drill. The system also includes PreView bits (for viewing of cores prior to caching) and Powder bits for acquisition of regolith or cuttings. The SAC total system mass is less than 22 kg for MER and MER+ size rovers and less than 32 kg for the MSL-size rover.

  1. Heat Generation During Bone Drilling: A Comparison Between Industrial and Orthopaedic Drill Bits.

    PubMed

    Hein, Christopher; Inceoglu, Serkan; Juma, David; Zuckerman, Lee

    2017-02-01

    Cortical bone drilling for preparation of screw placement is common in multiple surgical fields. The heat generated while drilling may reach thresholds high enough to cause osteonecrosis. This can compromise implant stability. Orthopaedic drill bits are several orders more expensive than their similarly sized, publicly available industrial counterparts. We hypothesize that an industrial bit will generate less heat during drilling, and the bits will not generate more heat after multiple cortical passes. We compared 4 4.0 mm orthopaedic and 1 3.97 mm industrial drill bits. Three types of each bit were drilled into porcine femoral cortices 20 times. The temperature of the bone was measured with thermocouple transducers. The heat generated during the first 5 drill cycles for each bit was compared to the last 5 cycles. These data were analyzed with analysis of covariance. The industrial drill bit generated the smallest mean increase in temperature (2.8 ± 0.29°C) P < 0.0001. No significant difference was identified comparing the first 5 cortices drilled to the last 5 cortices drilled for each bit. The P-values are as follows: Bosch (P = 0.73), Emerge (P = 0.09), Smith & Nephew (P = 0.08), Stryker (P = 0.086), and Synthes (P = 0.16). The industrial bit generated less heat during drilling than its orthopaedic counterparts. The bits maintained their performance after 20 drill cycles. Consideration should be given by manufacturers to design differences that may contribute to a more efficient cutting bit. Further investigation into the reuse of these drill bits may be warranted, as our data suggest their efficiency is maintained after multiple uses.

  2. A source-channel coding approach to digital image protection and self-recovery.

    PubMed

    Sarreshtedari, Saeed; Akhaee, Mohammad Ali

    2015-07-01

    Watermarking algorithms have been widely applied to the field of image forensics recently. One of these very forensic applications is the protection of images against tampering. For this purpose, we need to design a watermarking algorithm fulfilling two purposes in case of image tampering: 1) detecting the tampered area of the received image and 2) recovering the lost information in the tampered zones. State-of-the-art techniques accomplish these tasks using watermarks consisting of check bits and reference bits. Check bits are used for tampering detection, whereas reference bits carry information about the whole image. The problem of recovering the lost reference bits still stands. This paper is aimed at showing that having the tampering location known, image tampering can be modeled and dealt with as an erasure error. Therefore, an appropriate design of channel code can protect the reference bits against tampering. In the present proposed method, the total watermark bit-budget is dedicated to three groups: 1) source encoder output bits; 2) channel code parity bits; and 3) check bits. In watermark embedding phase, the original image is source coded and the output bit stream is protected using appropriate channel encoder. For image recovery, erasure locations detected by check bits help channel erasure decoder to retrieve the original source encoded image. Experimental results show that our proposed scheme significantly outperforms recent techniques in terms of image quality for both watermarked and recovered image. The watermarked image quality gain is achieved through spending less bit-budget on watermark, while image recovery quality is considerably improved as a consequence of consistent performance of designed source and channel codes.

  3. Theoretical and subjective bit assignments in transform picture

    NASA Technical Reports Server (NTRS)

    Jones, H. W., Jr.

    1977-01-01

    It is shown that all combinations of symmetrical input distributions with difference distortion measures give a bit assignment rule identical to the well-known rule for a Gaussian input distribution with mean-square error. Published work is examined to show that the bit assignment rule is useful for transforms of full pictures, but subjective bit assignments for transform picture coding using small block sizes are significantly different from the theoretical bit assignment rule. An intuitive explanation is based on subjective design experience, and a subjectively obtained bit assignment rule is given.

  4. Electrochemical sensing using voltage-current time differential

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woo, Leta Yar-Li; Glass, Robert Scott; Fitzpatrick, Joseph Jay

    2017-02-28

    A device for signal processing. The device includes a signal generator, a signal detector, and a processor. The signal generator generates an original waveform. The signal detector detects an affected waveform. The processor is coupled to the signal detector. The processor receives the affected waveform from the signal detector. The processor also compares at least one portion of the affected waveform with the original waveform. The processor also determines a difference between the affected waveform and the original waveform. The processor also determines a value corresponding to a unique portion of the determined difference between the original and affected waveforms.more » The processor also outputs the determined value.« less

  5. Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

    NASA Technical Reports Server (NTRS)

    Downie, John D.; Goodman, Joseph W.

    1989-01-01

    The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.

  6. Effect of bit wear on hammer drill handle vibration and productivity.

    PubMed

    Antonucci, Andrea; Barr, Alan; Martin, Bernard; Rempel, David

    2017-08-01

    The use of large electric hammer drills exposes construction workers to high levels of hand vibration that may lead to hand-arm vibration syndrome and other musculoskeletal disorders. The aim of this laboratory study was to investigate the effect of bit wear on drill handle vibration and drilling productivity (e.g., drilling time per hole). A laboratory test bench system was used with an 8.3 kg electric hammer drill and 1.9 cm concrete bit (a typical drill and bit used in commercial construction). The system automatically advanced the active drill into aged concrete block under feed force control to a depth of 7.6 cm while handle vibration was measured according to ISO standards (ISO 5349 and 28927). Bits were worn to 4 levels by consecutive hole drilling to 4 cumulative drilling depths: 0, 1,900, 5,700, and 7,600 cm. Z-axis handle vibration increased significantly (p<0.05) from 4.8 to 5.1 m/s 2 (ISO weighted) and from 42.7-47.6 m/s 2 (unweighted) when comparing a new bit to a bit worn to 1,900 cm of cumulative drilling depth. Handle vibration did not increase further with bits worn more than 1900 cm of cumulative drilling depth. Neither x- nor y-axis handle vibration was effected by bit wear. The time to drill a hole increased by 58% for the bit with 5,700 cm of cumulative drilling depth compared to a new bit. Bit wear led to a small but significant increase in both ISO weighted and unweighted z-axis handle vibration. Perhaps more important, bit wear had a large effect on productivity. The effect on productivity will influence a worker's allowable daily drilling time if exposure to drill handle vibration is near the ACGIH Threshold Limit Value. [1] Construction contractors should implement a bit replacement program based on these findings.

  7. BitTorious volunteer: server-side extensions for centrally-managed volunteer storage in BitTorrent swarms.

    PubMed

    Lee, Preston V; Dinu, Valentin

    2015-11-04

    Our publication of the BitTorious portal [1] demonstrated the ability to create a privatized distributed data warehouse of sufficient magnitude for real-world bioinformatics studies using minimal changes to the standard BitTorrent tracker protocol. In this second phase, we release a new server-side specification to accept anonymous philantropic storage donations by the general public, wherein a small portion of each user's local disk may be used for archival of scientific data. We have implementated the server-side announcement and control portions of this BitTorrent extension into v3.0.0 of the BitTorious portal, upon which compatible clients may be built. Automated test cases for the BitTorious Volunteer extensions have been added to the portal's v3.0.0 release, supporting validation of the "peer affinity" concept and announcement protocol introduced by this specification. Additionally, a separate reference implementation of affinity calculation has been provided in C++ for informaticians wishing to integrate into libtorrent-based projects. The BitTorrent "affinity" extensions as provided in the BitTorious portal reference implementation allow data publishers to crowdsource the extreme storage prerequisites for research in "big data" fields. With sufficient awareness and adoption of BitTorious Volunteer-based clients by the general public, the BitTorious portal may be able to provide peta-scale storage resources to the scientific community at relatively insignificant financial cost.

  8. PDC bit hydraulics design, profile are key to reducing balling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hariharan, P.R.; Azar, J.J.

    1996-12-09

    Polycrystalline diamond compact (PDC) bits with a parabolic profile and bladed hydraulic design have a lesser tendency to ball during drilling of reactive shales. PDC bits with ribbed or open-face hydraulic designs and those with flat or rounded profiles tended to ball more often in the bit balling experiments conducted. Experimental work also indicates that PDC hydraulic design seems to have a greater influence on bit balling tendency compared to bit profile design. There are five main factors that affect bit balling: formation type, drilling fluid, drilling hydraulics, bit design, and confining pressures. An equation for specific energy showed thatmore » it could be used to describe the efficiency of the drilling process by examining the amount of energy spent in drilling a unit volume of rock. This concept of specific energy has been used herein to correlate with the parameter Rd, a parameter to quantify the degree of balling.« less

  9. Modeling heterogeneous processor scheduling for real time systems

    NASA Technical Reports Server (NTRS)

    Leathrum, J. F.; Mielke, R. R.; Stoughton, J. W.

    1994-01-01

    A new model is presented to describe dataflow algorithms implemented in a multiprocessing system. Called the resource/data flow graph (RDFG), the model explicitly represents cyclo-static processor schedules as circuits of processor arcs which reflect the order that processors execute graph nodes. The model also allows the guarantee of meeting hard real-time deadlines. When unfolded, the model identifies statically the processor schedule. The model therefore is useful for determining the throughput and latency of systems with heterogeneous processors. The applicability of the model is demonstrated using a space surveillance algorithm.

  10. Remote drill bit loader

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dokos, J.A.

    1996-12-31

    A drill bit loader is described for loading a tapered shank of a drill bit into a similarly tapered recess in the end of a drill spindle. The spindle has a transverse slot at the inner end of the recess. The end of the tapered shank of the drill bit has a transverse tang adapted to engage in the slot so that the drill bit will be rotated by the spindle. The loader is in the form of a cylinder adapted to receive the drill bit with the shank projecting out of the outer end of the cylinder. Retainer pinsmore » prevent rotation of the drill bit in the cylinder. The spindle is lowered to extend the shank of the drill bit into the recess in the spindle and the spindle is rotated to align the slot in the spindle with the tang on the shank. A spring unit in the cylinder is compressed by the drill bit during its entry into the recess of the spindle and resiliently drives the tang into the slot in the spindle when the tang and slot are aligned. In typical remote drilling operations, whether in hot cells or water pits, drill bits have been held using a collet or end mill type holder with set screws. In either case, to load or change a drill bit required the use master-slave manipulators to position the bits and tighten the collet or set screws. This requirement eliminated many otherwise useful work areas because they were not equipped with slaves, particularly in water pits.« less

  11. Bit Grooming: Statistically accurate precision-preserving quantization with compression, evaluated in the netCDF operators (NCO, v4.4.8+)

    DOE PAGES

    Zender, Charles S.

    2016-09-19

    Geoscientific models and measurements generate false precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and be scientifically pointless, especially for measurements. By contrast, lossy compression can be both economical (save space) and heuristic (clarify data limitations) without compromising the scientific integrity of data. Data quantization can thus be appropriate regardless of whether space limitations are a concern. We introduce, implement, and characterize a new lossy compression scheme suitable for IEEE floating-point data. Our new Bit Grooming algorithm alternately shaves (to zero) and sets (to one) the least significant bits ofmore » consecutive values to preserve a desired precision. This is a symmetric, two-sided variant of an algorithm sometimes called Bit Shaving that quantizes values solely by zeroing bits. Our variation eliminates the artificial low bias produced by always zeroing bits, and makes Bit Grooming more suitable for arrays and multi-dimensional fields whose mean statistics are important. Bit Grooming relies on standard lossless compression to achieve the actual reduction in storage space, so we tested Bit Grooming by applying the DEFLATE compression algorithm to bit-groomed and full-precision climate data stored in netCDF3, netCDF4, HDF4, and HDF5 formats. Bit Grooming reduces the storage space required by initially uncompressed and compressed climate data by 25–80 and 5–65 %, respectively, for single-precision values (the most common case for climate data) quantized to retain 1–5 decimal digits of precision. The potential reduction is greater for double-precision datasets. When used aggressively (i.e., preserving only 1–2 digits), Bit Grooming produces storage reductions comparable to other quantization techniques such as Linear Packing. Unlike Linear Packing, whose guaranteed precision rapidly degrades within the relatively narrow dynamic range of values that it can compress, Bit Grooming guarantees the specified precision throughout the full floating-point range. Data quantization by Bit Grooming is irreversible (i.e., lossy) yet transparent, meaning that no extra processing is required by data users/readers. Hence Bit Grooming can easily reduce data storage volume without sacrificing scientific precision or imposing extra burdens on users.« less

  12. 0.25-μm lithography using a 50-kV shaped electron-beam vector scan system

    NASA Astrophysics Data System (ADS)

    Gesley, Mark A.; Mulera, Terry; Nurmi, C.; Radley, J.; Sagle, Allan L.; Standiford, Keith P.; Tan, Zoilo C. H.; Thomas, John R.; Veneklasen, Lee

    1995-05-01

    Performance data from a prototype 50 kV shaped electron-beam (e-beam) pattern generator is presented. This technology development is targeted towards 180-130 nm device design rules. It will be able to handle 1X NIST X-ray membranes, glass reduction reticles, and 4- to 8-inch wafers. The prototype system uses a planar stage adapted from the IBM EL-4 design. The electron optics is an 50 kV extension of the AEBLE%+TM) design. Lines and spaces of 0.12 micrometers with < 40 nm corner radius are resolved in 0.4 micrometers thick resist at 50 kV. This evolutionary platform will evolve further to include a new 100 kV column with telecentric deflection and a 21-bit (0.5 mm) major field for improved placement accuracy. A unique immersion shaper, faster data path electronics, and 15-bit (32 micrometers ) minor field deflection electronics will substantially increase the flash rate. To match its much finer address structure, the pattern generator figure word size will increase from 80 to 96 bits. The data path electronics uses field programmable gate array (FPGA) logic allowing writing strategy optimization via software reconfiguration. An advanced stage position control (ASPC) includes three-axis, (lambda) /1024 interferometry and a high bandwidth dynamic corrections processor (DCP). Along with its normal role of coordinate transformation and dynamic correction of deflection distortion, astigmatism, and defocus; the DCP improves accuracy by modifying deflection conditions and focus according to measured substrate height variations. It also enables yaw calibration and correction for Write-on-the FlyTM motion. The electronics incorporates JTAG components for built-in self- test (BIST), as well as syndrome checking to ensure data integrity. The design includes diagnostic capabilities from offsite as well as from the operator console. A combination of third-party software and an internal job preparation software system is used to fracture patterns. It handles tone reversal, overlap removal, sizing, and proximity correction. Processing of large files in a commercial mask shop environment is made more efficient by retaining hierarchy and using parallel processing and data compression techniques. Large GDSIITM and MEBES data files can be processed. Data includes timing benchmarks for a 1 Gbit DRAM on both proximity and reduction reticles. The paper presents 50 kV results on silicon and quartz substrates along with examples of overlay to an external grid, field butting, and critical dimension (CD) control data. Selective experiments testing system stability, calibration accuracy, and local correction software implementation on a VAX control computer are also given.

  13. Memory-Efficient Onboard Rock Segmentation

    NASA Technical Reports Server (NTRS)

    Burl, Michael C.; Thompson, David R.; Bornstein, Benjamin J.; deGranville, Charles K.

    2013-01-01

    Rockster-MER is an autonomous perception capability that was uploaded to the Mars Exploration Rover Opportunity in December 2009. This software provides the vision front end for a larger software system known as AEGIS (Autonomous Exploration for Gathering Increased Science), which was recently named 2011 NASA Software of the Year. As the first step in AEGIS, Rockster-MER analyzes an image captured by the rover, and detects and automatically identifies the boundary contours of rocks and regions of outcrop present in the scene. This initial segmentation step reduces the data volume from millions of pixels into hundreds (or fewer) of rock contours. Subsequent stages of AEGIS then prioritize the best rocks according to scientist- defined preferences and take high-resolution, follow-up observations. Rockster-MER has performed robustly from the outset on the Mars surface under challenging conditions. Rockster-MER is a specially adapted, embedded version of the original Rockster algorithm ("Rock Segmentation Through Edge Regrouping," (NPO- 44417) Software Tech Briefs, September 2008, p. 25). Although the new version performs the same basic task as the original code, the software has been (1) significantly upgraded to overcome the severe onboard re source limitations (CPU, memory, power, time) and (2) "bulletproofed" through code reviews and extensive testing and profiling to avoid the occurrence of faults. Because of the limited computational power of the RAD6000 flight processor on Opportunity (roughly two orders of magnitude slower than a modern workstation), the algorithm was heavily tuned to improve its speed. Several functional elements of the original algorithm were removed as a result of an extensive cost/benefit analysis conducted on a large set of archived rover images. The algorithm was also required to operate below a stringent 4MB high-water memory ceiling; hence, numerous tricks and strategies were introduced to reduce the memory footprint. Local filtering operations were re-coded to operate on horizontal data stripes across the image. Data types were reduced to smaller sizes where possible. Binary- valued intermediate results were squeezed into a more compact, one-bit-per-pixel representation through bit packing and bit manipulation macros. An estimated 16-fold reduction in memory footprint relative to the original Rockster algorithm was achieved. The resulting memory footprint is less than four times the base image size. Also, memory allocation calls were modified to draw from a static pool and consolidated to reduce memory management overhead and fragmentation. Rockster-MER has now been run onboard Opportunity numerous times as part of AEGIS with exceptional performance. Sample results are available on the AEGIS website at http://aegis.jpl.nasa.gov.

  14. Remote drill bit loader

    DOEpatents

    Dokos, J.A.

    1997-12-30

    A drill bit loader is described for loading a tapered shank of a drill bit into a similarly tapered recess in the end of a drill spindle. The spindle has a transverse slot at the inner end of the recess. The end of the tapered shank of the drill bit has a transverse tang adapted to engage in the slot so that the drill bit will be rotated by the spindle. The loader is in the form of a cylinder adapted to receive the drill bit with the shank projecting out of the outer end of the cylinder. Retainer pins prevent rotation of the drill bit in the cylinder. The spindle is lowered to extend the shank of the drill bit into the recess in the spindle and the spindle is rotated to align the slot in the spindle with the tang on the shank. A spring unit in the cylinder is compressed by the drill bit during its entry into the recess of the spindle and resiliently drives the tang into the slot in the spindle when the tang and slot are aligned. 5 figs.

  15. Region-of-interest determination and bit-rate conversion for H.264 video transcoding

    NASA Astrophysics Data System (ADS)

    Huang, Shu-Fen; Chen, Mei-Juan; Tai, Kuang-Han; Li, Mian-Shiuan

    2013-12-01

    This paper presents a video bit-rate transcoder for baseline profile in H.264/AVC standard to fit the available channel bandwidth for the client when transmitting video bit-streams via communication channels. To maintain visual quality for low bit-rate video efficiently, this study analyzes the decoded information in the transcoder and proposes a Bayesian theorem-based region-of-interest (ROI) determination algorithm. In addition, a curve fitting scheme is employed to find the models of video bit-rate conversion. The transcoded video will conform to the target bit-rate by re-quantization according to our proposed models. After integrating the ROI detection method and the bit-rate transcoding models, the ROI-based transcoder allocates more coding bits to ROI regions and reduces the complexity of the re-encoding procedure for non-ROI regions. Hence, it not only keeps the coding quality but improves the efficiency of the video transcoding for low target bit-rates and makes the real-time transcoding more practical. Experimental results show that the proposed framework gets significantly better visual quality.

  16. Remote drill bit loader

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dokos, James A.

    A drill bit loader for loading a tapered shank of a drill bit into a similarly tapered recess in the end of a drill spindle. The spindle has a transverse slot at the inner end of the recess. The end of the tapered shank of the drill bit has a transverse tang adapted to engage in the slot so that the drill bit will be rotated by the spindle. The loader is in the form of a cylinder adapted to receive the drill bit with the shank projecting out of the outer end of the cylinder. Retainer pins prevent rotationmore » of the drill bit in the cylinder. The spindle is lowered to extend the shank of the drill bit into the recess in the spindle and the spindle is rotated to align the slot in the spindle with the tang on the shank. A spring unit in the cylinder is compressed by the drill bit during its entry into the recess of the spindle and resiliently drives the tang into the slot in the spindle when the tang and slot are aligned.« less

  17. Remote drill bit loader

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dokos, J.A.

    A drill bit loader is described for loading a tapered shank of a drill bit into a similarly tapered recess in the end of a drill spindle. The spindle has a transverse slot at the inner end of the recess. The end of the tapered shank of the drill bit has a transverse tang adapted to engage in the slot so that the drill bit will be rotated by the spindle. The loader is in the form of a cylinder adapted to receive the drill bit with the shank projecting out of the outer end of the cylinder. Retainer pinsmore » prevent rotation of the drill bit in the cylinder. The spindle is lowered to extend the shank of the drill bit into the recess in the spindle and the spindle is rotated to align the slot in the spindle with the tang on the shank. A spring unit in the cylinder is compressed by the drill bit during its entry into the recess of the spindle and resiliently drives the tang into the slot in the spindle when the tang and slot are aligned. 5 figs.« less

  18. Cooperative and competitive concurrency in scientific computing. A full open-source upgrade of the program for dynamical calculations of RHEED intensity oscillations

    NASA Astrophysics Data System (ADS)

    Daniluk, Andrzej

    2011-06-01

    A computational model is a computer program, which attempts to simulate an abstract model of a particular system. Computational models use enormous calculations and often require supercomputer speed. As personal computers are becoming more and more powerful, more laboratory experiments can be converted into computer models that can be interactively examined by scientists and students without the risk and cost of the actual experiments. The future of programming is concurrent programming. The threaded programming model provides application programmers with a useful abstraction of concurrent execution of multiple tasks. The objective of this release is to address the design of architecture for scientific application, which may execute as multiple threads execution, as well as implementations of the related shared data structures. New version program summaryProgram title: GrowthCP Catalogue identifier: ADVL_v4_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v4_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 32 269 No. of bytes in distributed program, including test data, etc.: 8 234 229 Distribution format: tar.gz Programming language: Free Object Pascal Computer: multi-core x64-based PC Operating system: Windows XP, Vista, 7 Has the code been vectorised or parallelized?: No RAM: More than 1 GB. The program requires a 32-bit or 64-bit processor to run the generated code. Memory is addressed using 32-bit (on 32-bit processors) or 64-bit (on 64-bit processors with 64-bit addressing) pointers. The amount of addressed memory is limited only by the available amount of virtual memory. Supplementary material: The figures mentioned in the "Summary of revisions" section can be obtained here. Classification: 4.3, 7.2, 6.2, 8, 14 External routines: Lazarus [1] Catalogue identifier of previous version: ADVL_v3_0 Journal reference of previous version: Comput. Phys. Comm. 181 (2010) 709 Does the new version supersede the previous version?: Yes Nature of problem: Reflection high-energy electron diffraction (RHEED) is an important in-situ analysis technique, which is capable of giving quantitative information about the growth process of thin layers and its control. It can be used to calibrate growth rate, analyze surface morphology, calibrate surface temperature, monitor the arrangement of the surface atoms, and provide information about growth kinetics. Such control allows the development of structures where the electrons can be confined in space, giving quantum wells or even quantum dots. In order to determine the atomic positions of atoms in the first few layers, the RHEED intensity must be measured as a function of the scattering angles and then compared with dynamic calculations. The objective of this release is to address the design of architecture for application that simulates the rocking curves RHEED intensities during hetero-epitaxial growth process of thin films. Solution method: The GrowthCP is a complex numerical model that uses multiple threads for simulation of epitaxial growth of thin layers. This model consists of two transactional parts. The first part is a mathematical model being based on the Runge-Kutta method with adaptive step-size control. The second part represents first-principles of the one-dimensional RHEED computational model. This model is based on solving a one-dimensional Schrödinger equation. Several problems can arise when applications contain a mixture of data access code, numerical code, and presentation code. Such applications are difficult to maintain, because interdependencies between all the components cause strong ripple effects whenever a change is made anywhere. Adding new data views often requires reimplementing a numerical code, which then requires maintenance in multiple places. In order to solve problems of this type, the computational and threading layers of the project have been implemented in the form of one design pattern as a part of Model-View-Controller architecture. Reasons for new version: Responding to the users' feedback the Growth09 project has been upgraded to a standard that allows the carrying out of sample computations of the RHEED intensities for a disordered surface for a wide range of single- and epitaxial hetero-structures. The design pattern on which the project is based has also been improved. It is shown that this model can be effectively used for multithreaded growth simulations of thin epitaxial layers and corresponding RHEED intensities for a wide range of single- and hetero-structures. Responding to the users' feedback the present release has been implemented using a well-documented free compiler [1] not requiring the special configuration and installation additional libraries. Summary of revisions: The logical structure of the Growth09 program has been modified according to the scheme showed in Fig. 1. The class diagram in Fig. 1 is a static view of the main platform-specific elements of the GrowthCP architecture. Fig. 2 provides a dynamic view by showing the creation and destruction simplistic sequence diagram for the process. The program requires the user to provide the appropriate parameters in the form of a knowledge base for the crystal structures under investigation. These parameters are loaded from the parameters. ini files at run-time. Instructions to prepare the .ini files can be found in the new distribution. The program enables carrying out different growth models and one-dimensional dynamical RHEED calculations for the fcc lattice with basis of three-atoms, fcc lattice with basis of two-atoms, fcc lattice with single atom basis, Zinc-Blende, Sodium Chloride, and Wurtzite crystalline structures and hetero-structures, but yet the Fourier component of the scattering potential in the TRHEEDCalculations.crystPotUgXXX() procedure can be modified and implemented according to users' specific application requirements. The Fourier component of the scattering potential of the whole crystalline hetero-structures can be determined as a sum of contributions coming from all thin slices of individual atomic layers. To carry out one-dimensional calculations of the scattering potentials, the program uses properly constructed self-consistent procedures. Each component of the system shown in Figs. 1 and 2 is fully extendable and can easily be adapted to new changeable requirements. Two essential logical elements of the system, i.e. TGrowthTransaction and TRHEEDCalculations classes, were designed and implemented in this way for them to pass the information to themselves without the need to use the data-exchange files given. In consequence each of them can be independently modified and/or extended. Implementing other types of differential equations and the different algorithm for solving them in the TGrowthTransaction class does not require another implementation of the TRHEEDCalculations class. Similarly, implementing other forms of scattering potential and different algorithm for RHEED calculation stays without the influence on the TGrowthTransaction class construction. Unusual features: The program is distributed in the form of main project GrowthCP.lpr, with associated files, and should be compiled using Lazarus IDE. The program should be compiled with English/USA regional and language options. Running time: The typical running time is machine and user-parameters dependent. References: http://sourceforge.net/projects/lazarus/files/.

  19. Electrochemical sensing using comparison of voltage-current time differential values during waveform generation and detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woo, Leta Yar-Li; Glass, Robert Scott; Fitzpatrick, Joseph Jay

    2018-01-02

    A device for signal processing. The device includes a signal generator, a signal detector, and a processor. The signal generator generates an original waveform. The signal detector detects an affected waveform. The processor is coupled to the signal detector. The processor receives the affected waveform from the signal detector. The processor also compares at least one portion of the affected waveform with the original waveform. The processor also determines a difference between the affected waveform and the original waveform. The processor also determines a value corresponding to a unique portion of the determined difference between the original and affected waveforms.more » The processor also outputs the determined value.« less

  20. Graded bit patterned magnetic arrays fabricated via angled low-energy He ion irradiation.

    PubMed

    Chang, L V; Nasruallah, A; Ruchhoeft, P; Khizroev, S; Litvinov, D

    2012-07-11

    A bit patterned magnetic array based on Co/Pd magnetic multilayers with a binary perpendicular magnetic anisotropy distribution was fabricated. The binary anisotropy distribution was attained through angled helium ion irradiation of a bit edge using hydrogen silsesquioxane (HSQ) resist as an ion stopping layer to protect the rest of the bit. The viability of this technique was explored numerically and evaluated through magnetic measurements of the prepared bit patterned magnetic array. The resulting graded bit patterned magnetic array showed a 35% reduction in coercivity and a 9% narrowing of the standard deviation of the switching field.

  1. Performance comparison between 8 and 14 bit-depth imaging in polarization-sensitive swept-source optical coherence tomography

    NASA Astrophysics Data System (ADS)

    Lu, Zenghai; Kasaragoda, Deepa K.; Matcher, Stephen J.

    2011-03-01

    We compare true 8 and 14 bit-depth imaging of SS-OCT and polarization-sensitive SS-OCT (PS-SS-OCT) at 1.3μm wavelength by using two hardware-synchronized high-speed data acquisition (DAQ) boards. The two DAQ boards read exactly the same imaging data for comparison. The measured system sensitivity at 8-bit depth is comparable to that for 14-bit acquisition when using the more sensitive of the available full analog input voltage ranges of the ADC. Ex-vivo structural and birefringence images of an equine tendon sample indicate no significant differences between images acquired by the two DAQ boards suggesting that 8-bit DAQ boards can be employed to increase imaging speeds and reduce storage in clinical SS-OCT/PS-SS-OCT systems. We also compare the resulting image quality when the image data sampled with the 14-bit DAQ from human finger skin is artificially bit-reduced during post-processing. However, in agreement with the results reported previously, we also observe that in our system that real-world 8-bit image shows more artifacts than the image acquired by numerically truncating to 8-bits from the raw 14-bit image data, especially in low intensity image area. This is due to the higher noise floor and reduced dynamic range of the 8-bit DAQ. One possible disadvantage is a reduced imaging dynamic range which can manifest itself as an increase in image artefacts due to strong Fresnel reflection.

  2. Hybrid Electro-Optic Processor

    DTIC Science & Technology

    1991-07-01

    This report describes the design of a hybrid electro - optic processor to perform adaptive interference cancellation in radar systems. The processor is...modulator is reported. Included is this report is a discussion of the design, partial fabrication in the laboratory, and partial testing of the hybrid electro ... optic processor. A follow on effort is planned to complete the construction and testing of the processor. The work described in this report is the

  3. JPRS Report, Science & Technology, Europe.

    DTIC Science & Technology

    1991-04-30

    processor in collaboration with Intel . The processor , christened Touchstone, will be used as the core of a parallel computer with 2,000 processors . One of...ELECTRONIQUE HEBDO in French 24 Jan 91 pp 14-15 [Article by Claire Remy: "Everything Set for Neural Signal Processors " first paragraph is ELECTRONIQUE...paving the way for neural signal processors in so doing. The principal advantage of this specific circuit over a neuromimetic software program is

  4. Processor register error correction management

    DOEpatents

    Bose, Pradip; Cher, Chen-Yong; Gupta, Meeta S.

    2016-12-27

    Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.

  5. The CSM testbed matrix processors internal logic and dataflow descriptions

    NASA Technical Reports Server (NTRS)

    Regelbrugge, Marc E.; Wright, Mary A.

    1988-01-01

    This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.

  6. PDC bits break ground with advanced vibration mitigation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    1995-10-01

    Advancements in PDC bit technology have resulted in the identification and characterization of different types of vibrational modes that historically have limited PDC bit performance. As a result, concepts have been developed that prevent the initiation of vibration and also mitigate its damaging effects once it occurs. This vibration-reducing concept ensures more efficient use of the energy available to a PDC bit performance. As a result, concepts have been developed that prevent the imitation of vibration and also mitigate its damaging effects once it occurs. This vibration-reducing concept ensures more efficient use of the energy available to a PDC bit,more » thereby improving its performance. This improved understanding of the complex forces affecting bit performance is driving bit customization for specific drilling programs.« less

  7. Shuttle bit rate synchronizer. [signal to noise ratios and error analysis

    NASA Technical Reports Server (NTRS)

    Huey, D. C.; Fultz, G. L.

    1974-01-01

    A shuttle bit rate synchronizer brassboard unit was designed, fabricated, and tested, which meets or exceeds the contractual specifications. The bit rate synchronizer operates at signal-to-noise ratios (in a bit rate bandwidth) down to -5 dB while exhibiting less than 0.6 dB bit error rate degradation. The mean acquisition time was measured to be less than 2 seconds. The synchronizer is designed around a digital data transition tracking loop whose phase and data detectors are integrate-and-dump filters matched to the Manchester encoded bits specified. It meets the reliability (no adjustments or tweaking) and versatility (multiple bit rates) of the shuttle S-band communication system through an implementation which is all digital after the initial stage of analog AGC and A/D conversion.

  8. Method for compression of binary data

    DOEpatents

    Berlin, Gary J.

    1996-01-01

    The disclosed method for compression of a series of data bytes, based on LZSS-based compression methods, provides faster decompression of the stored data. The method involves the creation of a flag bit buffer in a random access memory device for temporary storage of flag bits generated during normal LZSS-based compression. The flag bit buffer stores the flag bits separately from their corresponding pointers and uncompressed data bytes until all input data has been read. Then, the flag bits are appended to the compressed output stream of data. Decompression can be performed much faster because bit manipulation is only required when reading the flag bits and not when reading uncompressed data bytes and pointers. Uncompressed data is read using byte length instructions and pointers are read using word instructions, thus reducing the time required for decompression.

  9. A new thermal model for bone drilling with applications to orthopaedic surgery.

    PubMed

    Lee, JuEun; Rabin, Yoed; Ozdoganlar, O Burak

    2011-12-01

    This paper presents a new thermal model for bone drilling with applications to orthopaedic surgery. The new model combines a unique heat-balance equation for the system of the drill bit and the chip stream, an ordinary heat diffusion equation for the bone, and heat generation at the drill tip, arising from the cutting process and friction. Modeling of the drill bit-chip stream system assumes an axial temperature distribution and a lumped heat capacity effect in the transverse cross-section. The new model is solved numerically using a tailor-made finite-difference scheme for the drill bit-chip stream system, coupled with a classic finite-difference method for the bone. The theoretical investigation addresses the significance of heat transfer between the drill bit and the bone, heat convection from the drill bit to the surroundings, and the effect of the initial temperature of the drill bit on the developing thermal field. Using the new model, a parametric study on the effects of machining conditions and drill-bit geometries on the resulting temperature field in the bone and the drill bit is presented. Results of this study indicate that: (1) the maximum temperature in the bone decreases with increased chip flow; (2) the transient temperature distribution is strongly influenced by the initial temperature; (3) the continued cooling (irrigation) of the drill bit reduces the maximum temperature even when the tip is distant from the cooled portion of the drill bit; and (4) the maximum temperature increases with increasing spindle speed, increasing feed rate, decreasing drill-bit diameter, increasing point angle, and decreasing helix angle. The model is expected to be useful in determination of optimum drilling conditions and drill-bit geometries. Copyright © 2011. Published by Elsevier Ltd.

  10. Method to manufacture bit patterned magnetic recording media

    DOEpatents

    Raeymaekers, Bart; Sinha, Dipen N

    2014-05-13

    A method to increase the storage density on magnetic recording media by physically separating the individual bits from each other with a non-magnetic medium (so-called bit patterned media). This allows the bits to be closely packed together without creating magnetic "cross-talk" between adjacent bits. In one embodiment, ferromagnetic particles are submerged in a resin solution, contained in a reservoir. The bottom of the reservoir is made of piezoelectric material.

  11. 7 CFR 1435.310 - Sharing processors' allocations with producers.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...

  12. 7 CFR 1435.310 - Sharing processors' allocations with producers.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...

  13. 7 CFR 1435.310 - Sharing processors' allocations with producers.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...

  14. 7 CFR 1435.310 - Sharing processors' allocations with producers.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...

  15. 7 CFR 1435.310 - Sharing processors' allocations with producers.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... CREDIT CORPORATION, DEPARTMENT OF AGRICULTURE LOANS, PURCHASES, AND OTHER OPERATIONS SUGAR PROGRAM Flexible Marketing Allotments For Sugar § 1435.310 Sharing processors' allocations with producers. (a) Every sugar beet and sugarcane processor must provide CCC a certification that: (1) The processor...

  16. 40 CFR 791.45 - Processors.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ...) When a test rule or subsequent Federal Register notice pertaining to a test rule expressly obligates processors as well as manufacturers to assume direct testing and data reimbursement responsibilities. (2... processors voluntarily agree to reimburse manufacturers for a portion of test costs. Only those processors...

  17. The Behavioral Intervention Technology Model: An Integrated Conceptual and Technological Framework for eHealth and mHealth Interventions

    PubMed Central

    Schueller, Stephen M; Montague, Enid; Burns, Michelle Nicole; Rashidi, Parisa

    2014-01-01

    A growing number of investigators have commented on the lack of models to inform the design of behavioral intervention technologies (BITs). BITs, which include a subset of mHealth and eHealth interventions, employ a broad range of technologies, such as mobile phones, the Web, and sensors, to support users in changing behaviors and cognitions related to health, mental health, and wellness. We propose a model that conceptually defines BITs, from the clinical aim to the technological delivery framework. The BIT model defines both the conceptual and technological architecture of a BIT. Conceptually, a BIT model should answer the questions why, what, how (conceptual and technical), and when. While BITs generally have a larger treatment goal, such goals generally consist of smaller intervention aims (the "why") such as promotion or reduction of specific behaviors, and behavior change strategies (the conceptual "how"), such as education, goal setting, and monitoring. Behavior change strategies are instantiated with specific intervention components or “elements” (the "what"). The characteristics of intervention elements may be further defined or modified (the technical "how") to meet the needs, capabilities, and preferences of a user. Finally, many BITs require specification of a workflow that defines when an intervention component will be delivered. The BIT model includes a technological framework (BIT-Tech) that can integrate and implement the intervention elements, characteristics, and workflow to deliver the entire BIT to users over time. This implementation may be either predefined or include adaptive systems that can tailor the intervention based on data from the user and the user’s environment. The BIT model provides a step towards formalizing the translation of developer aims into intervention components, larger treatments, and methods of delivery in a manner that supports research and communication between investigators on how to design, develop, and deploy BITs. PMID:24905070

  18. The behavioral intervention technology model: an integrated conceptual and technological framework for eHealth and mHealth interventions.

    PubMed

    Mohr, David C; Schueller, Stephen M; Montague, Enid; Burns, Michelle Nicole; Rashidi, Parisa

    2014-06-05

    A growing number of investigators have commented on the lack of models to inform the design of behavioral intervention technologies (BITs). BITs, which include a subset of mHealth and eHealth interventions, employ a broad range of technologies, such as mobile phones, the Web, and sensors, to support users in changing behaviors and cognitions related to health, mental health, and wellness. We propose a model that conceptually defines BITs, from the clinical aim to the technological delivery framework. The BIT model defines both the conceptual and technological architecture of a BIT. Conceptually, a BIT model should answer the questions why, what, how (conceptual and technical), and when. While BITs generally have a larger treatment goal, such goals generally consist of smaller intervention aims (the "why") such as promotion or reduction of specific behaviors, and behavior change strategies (the conceptual "how"), such as education, goal setting, and monitoring. Behavior change strategies are instantiated with specific intervention components or "elements" (the "what"). The characteristics of intervention elements may be further defined or modified (the technical "how") to meet the needs, capabilities, and preferences of a user. Finally, many BITs require specification of a workflow that defines when an intervention component will be delivered. The BIT model includes a technological framework (BIT-Tech) that can integrate and implement the intervention elements, characteristics, and workflow to deliver the entire BIT to users over time. This implementation may be either predefined or include adaptive systems that can tailor the intervention based on data from the user and the user's environment. The BIT model provides a step towards formalizing the translation of developer aims into intervention components, larger treatments, and methods of delivery in a manner that supports research and communication between investigators on how to design, develop, and deploy BITs.

  19. Ultra-high throughput real-time instruments for capturing fast signals and rare events

    NASA Astrophysics Data System (ADS)

    Buckley, Brandon Walter

    Wide-band signals play important roles in the most exciting areas of science, engineering, and medicine. To keep up with the demands of exploding internet traffic, modern data centers and communication networks are employing increasingly faster data rates. Wide-band techniques such as pulsed radar jamming and spread spectrum frequency hopping are used on the battlefield to wrestle control of the electromagnetic spectrum. Neurons communicate with each other using transient action potentials that last for only milliseconds at a time. And in the search for rare cells, biologists flow large populations of cells single file down microfluidic channels, interrogating them one-by-one, tens of thousands of times per second. Studying and enabling such high-speed phenomena pose enormous technical challenges. For one, parasitic capacitance inherent in analog electrical components limits their response time. Additionally, converting these fast analog signals to the digital domain requires enormous sampling speeds, which can lead to significant jitter and distortion. State-of-the-art imaging technologies, essential for studying biological dynamics and cells in flow, are limited in speed and sensitivity by finite charge transfer and read rates, and by the small numbers of photo-electrons accumulated in short integration times. And finally, ultra-high throughput real-time digital processing is required at the backend to analyze the streaming data. In this thesis, I discuss my work in developing real-time instruments, employing ultrafast optical techniques, which overcome some of these obstacles. In particular, I use broadband dispersive optics to slow down fast signals to speeds accessible to high-bit depth digitizers and signal processors. I also apply telecommunication multiplexing techniques to boost the speeds of confocal fluorescence microscopy. The photonic time stretcher (TiSER) uses dispersive Fourier transformation to slow down analog signals before digitization and processing. The act of time-stretching effectively boosts the performance of the back-end electronics and digital signal processors. The slowed down signals reach the back-end electronics with reduced bandwidth, and are therefore less affected by high-frequency roll-off and distortion. Time-stretching also increases the effective sampling rate of analog-to-digital converters and reduces aperture jitter, thereby improving resolution. Finally, the instantaneous throughputs of digital signal processors are enhanced by the stretch factor to otherwise unattainable speeds. Leveraging these unique capabilities, TiSER becomes the ideal tool for capturing high-speed signals and characterizing rare phenomena. For this thesis, I have developed techniques to improve the spectral efficiency, bandwidth, and resolution of TiSER using polarization multiplexing, all-optical modulation, and coherent dispersive Fourier transformation. To reduce the latency and improve the data handling capacity, I have also designed and implemented a real-time digital signal processing electronic backend, achieving 1.5 tera-bit per second instantaneous processing throughput. Finally, I will present results from experiments highlighting TiSER's impact in real-world applications. Confocal fluorescence microscopy is the most widely used method for unveiling the molecular composition of biological specimens. However, the weak optical emission of fluorescent probes and the tradeoff between imaging speed and sensitivity is problematic for acquiring blur-free images of fast phenomena and cells flowing at high speed. Here I introduce a new fluorescence imaging modality, which leverages techniques from wireless communication to reach record pixel and frame rates. Termed Fluorescence Imaging using Radio-frequency tagged Emission (FIRE), this new imaging modality is capable of resolving never before seen dynamics in living cells - such as action potentials in neurons and metabolic waves in astrocytes - as well as performing high-content image assays of cells and particles in high-speed flow.

  20. An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

    DOE PAGES

    Vincenti, H.; Lobet, M.; Lehe, R.; ...

    2016-09-19

    In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries:  OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less

  1. An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vincenti, H.; Lobet, M.; Lehe, R.

    In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈20pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD registermore » length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scat;ter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of ×2 to ×2.5 speed-up in double precision for particle shape factor of orders 1–3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles). Program summary Program Title: vec_deposition Program Files doi:http://dx.doi.org/10.17632/nh77fv9k8c.1 Licensing provisions: BSD 3-Clause Programming language: Fortran 90 External routines/libraries:  OpenMP > 4.0 Nature of problem: Exascale architectures will have many-core processors per node with long vector data registers capable of performing one single instruction on multiple data during one clock cycle. Data register lengths are expected to double every four years and this pushes for new portable solutions for efficiently vectorizing Particle-In-Cell codes on these future many-core architectures. One of the main hotspot routines of the PIC algorithm is the current/charge deposition for which there is no efficient and portable vector algorithm. Solution method: Here we provide an efficient and portable vector algorithm of current/charge deposition routines that uses a new data structure, which significantly reduces gather/scatter operations. Vectorization is controlled using OpenMP 4.0 compiler directives for vectorization which ensures portability across different architectures. Restrictions: Here we do not provide the full PIC algorithm with an executable but only vector routines for current/charge deposition. These scalar/vector routines can be used as library routines in your 3D Particle-In-Cell code. However, to get the best performances out of vector routines you have to satisfy the two following requirements: (1) Your code should implement particle tiling (as explained in the manuscript) to allow for maximized cache reuse and reduce memory accesses that can hinder vector performances. The routines can be used directly on each particle tile. (2) You should compile your code with a Fortran 90 compiler (e.g Intel, gnu or cray) and provide proper alignment flags and compiler alignment directives (more details in README file).« less

  2. Digital Ratiometer

    NASA Technical Reports Server (NTRS)

    Beer, R.

    1985-01-01

    Small, low-cost comparator with 24-bit-precision yields ratio signal from pair of analog or digital input signals. Arithmetic logic chips (bit-slice) sample two 24-bit analog-to-digital converters approximately once every millisecond and accumulate them in two 24-bit registers. Approach readily modified to arbitrary precision.

  3. Interprocessor bus switching system for simultaneous communication in plural bus parallel processing system

    DOEpatents

    Atac, R.; Fischler, M.S.; Husby, D.E.

    1991-01-15

    A bus switching apparatus and method for multiple processor computer systems comprises a plurality of bus switches interconnected by branch buses. Each processor or other module of the system is connected to a spigot of a bus switch. Each bus switch also serves as part of a backplane of a modular crate hardware package. A processor initiates communication with another processor by identifying that other processor. The bus switch to which the initiating processor is connected identifies and secures, if possible, a path to that other processor, either directly or via one or more other bus switches which operate similarly. If a particular desired path through a given bus switch is not available to be used, an alternate path is considered, identified and secured. 11 figures.

  4. Method and structure for skewed block-cyclic distribution of lower-dimensional data arrays in higher-dimensional processor grids

    DOEpatents

    Chatterjee, Siddhartha [Yorktown Heights, NY; Gunnels, John A [Brewster, NY

    2011-11-08

    A method and structure of distributing elements of an array of data in a computer memory to a specific processor of a multi-dimensional mesh of parallel processors includes designating a distribution of elements of at least a portion of the array to be executed by specific processors in the multi-dimensional mesh of parallel processors. The pattern of the designating includes a cyclical repetitive pattern of the parallel processor mesh, as modified to have a skew in at least one dimension so that both a row of data in the array and a column of data in the array map to respective contiguous groupings of the processors such that a dimension of the contiguous groupings is greater than one.

  5. Interprocessor bus switching system for simultaneous communication in plural bus parallel processing system

    DOEpatents

    Atac, Robert; Fischler, Mark S.; Husby, Donald E.

    1991-01-01

    A bus switching apparatus and method for multiple processor computer systems comprises a plurality of bus switches interconnected by branch buses. Each processor or other module of the system is connected to a spigot of a bus switch. Each bus switch also serves as part of a backplane of a modular crate hardware package. A processor initiates communication with another processor by identifying that other processor. The bus switch to which the initiating processor is connected identifies and secures, if possible, a path to that other processor, either directly or via one or more other bus switches which operate similarly. If a particular desired path through a given bus switch is not available to be used, an alternate path is considered, identified and secured.

  6. Technology Development and Field Trials of EGS Drilling Systems at Chocolate Mountain

    DOE Data Explorer

    Steven Knudsen

    2012-01-01

    Polycrystalline diamond compact (PDC) bits are routinely used in the oil and gas industry for drilling medium to hard rock but have not been adopted for geothermal drilling, largely due to past reliability issues and higher purchase costs. The Sandia Geothermal Research Department has recently completed a field demonstration of the applicability of advanced synthetic diamond drill bits for production geothermal drilling. Two commercially-available PDC bits were tested in a geothermal drilling program in the Chocolate Mountains in Southern California. These bits drilled the granitic formations with significantly better Rate of Penetration (ROP) and bit life than the roller cone bit they are compared with. Drilling records and bit performance data along with associated drilling cost savings are presented herein. The drilling trials have demonstrated PDC bit drilling technology has matured for applicability and improvements to geothermal drilling. This will be especially beneficial for development of Enhanced Geothermal Systems whereby resources can be accessed anywhere within the continental US by drilling to deep, hot resources in hard, basement rock formations.

  7. The Anoikis Effector Bit1 Inhibits EMT through Attenuation of TLE1-Mediated Repression of E-Cadherin in Lung Cancer Cells

    PubMed Central

    Yao, Xin; Pham, Tri; Temple, Brandi; Gray, Selena; Cannon, Cornita; Chen, Renwei; Abdel-Mageed, Asim B.; Biliran, Hector

    2016-01-01

    The mitochondrial Bcl-2 inhibitor of transcription 1 (Bit1) protein is part of an anoikis-regulating pathway that is selectively dependent on integrins. We previously demonstrated that the caspase-independent apoptotic effector Bit1 exerts tumor suppressive function in lung cancer in part by inhibiting anoikis resistance and anchorage-independent growth in vitro and tumorigenicity in vivo. Herein we show a novel function of Bit1 as an inhibitor cell migration and epithelial–mesenchymal transition (EMT) in the human lung adenocarcinoma A549 cell line. Suppression of endogenous Bit1 expression via siRNA and shRNA strategies promoted mesenchymal phenotypes, including enhanced fibroblastoid morphology and cell migratory potential with concomitant downregulation of the epithelial marker E-cadherin expression. Conversely, ectopic Bit1 expression in A549 cells promoted epithelial transition characterized by cuboidal-like epithelial cell phenotype, reduced cell motility, and upregulated E-cadherin expression. Specific downregulation of E-cadherin in Bit1-transfected cells was sufficient to block Bit1-mediated inhibition of cell motility while forced expression of E-cadherin alone attenuated the enhanced migration of Bit1 knockdown cells, indicating that E-cadherin is a downstream target of Bit1 in regulating cell motility. Furthermore, quantitative real-time PCR and reporter analyses revealed that Bit1 upregulates E-cadherin expression at the transcriptional level through the transcriptional regulator Amino-terminal Enhancer of Split (AES) protein. Importantly, the Bit1/AES pathway induction of E-cadherin expression involves inhibition of the TLE1-mediated repression of E-cadherin, by decreasing TLE1 corepressor occupancy at the E-cadherin promoter as revealed by chromatin immunoprecipitation assays. Consistent with its EMT inhibitory function, exogenous Bit1 expression significantly suppressed the formation of lung metastases of A549 cells in an in vivo experimental metastasis model. Taken together, our studies indicate Bit1 is an inhibitor of EMT and metastasis in lung cancer and hence can serve as a molecular target in curbing lung cancer aggressiveness. PMID:27655370

  8. Effects of size on three-cone bit performance in laboratory drilled shale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Black, A.D.; DiBona, B.G.; Sandstrom, J.L.

    1982-09-01

    The effects of size on the performance of 3-cone bits were measured during laboratory drilling tests in shale at simulated downhole conditions. Four Reed HP-SM 3-cone bits with diameters of 6 1/2, 7 7/8, 9 1/2 and 11 inches were used to drill Mancos shale with water-based mud. The tests were conducted at constant borehole pressure, two conditions of hydraulic horsepower per square inch of bit area, three conditions of rotary speed and four conditions of weight-on-bit per inch of bit diameter. The resulting penetration rates and torques were measured. Statistical techniques were used to analyze the data.

  9. LSB-based Steganography Using Reflected Gray Code for Color Quantum Images

    NASA Astrophysics Data System (ADS)

    Li, Panchi; Lu, Aiping

    2018-02-01

    At present, the classical least-significant-bit (LSB) based image steganography has been extended to quantum image processing. For the existing LSB-based quantum image steganography schemes, the embedding capacity is no more than 3 bits per pixel. Therefore, it is meaningful to study how to improve the embedding capacity of quantum image steganography. This work presents a novel LSB-based steganography using reflected Gray code for colored quantum images, and the embedding capacity of this scheme is up to 4 bits per pixel. In proposed scheme, the secret qubit sequence is considered as a sequence of 4-bit segments. For the four bits in each segment, the first bit is embedded in the second LSB of B channel of the cover image, and and the remaining three bits are embedded in LSB of RGB channels of each color pixel simultaneously using reflected-Gray code to determine the embedded bit from secret information. Following the transforming rule, the LSB of stego-image are not always same as the secret bits and the differences are up to almost 50%. Experimental results confirm that the proposed scheme shows good performance and outperforms the previous ones currently found in the literature in terms of embedding capacity.

  10. Quantization of Gaussian samples at very low SNR regime in continuous variable QKD applications

    NASA Astrophysics Data System (ADS)

    Daneshgaran, Fred; Mondin, Marina

    2016-09-01

    The main problem for information reconciliation in continuous variable Quantum Key Distribution (QKD) at low Signal to Noise Ratio (SNR) is quantization and assignment of labels to the samples of the Gaussian Random Variables (RVs) observed at Alice and Bob. Trouble is that most of the samples, assuming that the Gaussian variable is zero mean which is de-facto the case, tend to have small magnitudes and are easily disturbed by noise. Transmission over longer and longer distances increases the losses corresponding to a lower effective SNR exasperating the problem. This paper looks at the quantization problem of the Gaussian samples at very low SNR regime from an information theoretic point of view. We look at the problem of two bit per sample quantization of the Gaussian RVs at Alice and Bob and derive expressions for the mutual information between the bit strings as a result of this quantization. The quantization threshold for the Most Significant Bit (MSB) should be chosen based on the maximization of the mutual information between the quantized bit strings. Furthermore, while the LSB string at Alice and Bob are balanced in a sense that their entropy is close to maximum, this is not the case for the second most significant bit even under optimal threshold. We show that with two bit quantization at SNR of -3 dB we achieve 75.8% of maximal achievable mutual information between Alice and Bob, hence, as the number of quantization bits increases beyond 2-bits, the number of additional useful bits that can be extracted for secret key generation decreases rapidly. Furthermore, the error rates between the bit strings at Alice and Bob at the same significant bit level are rather high demanding very powerful error correcting codes. While our calculations and simulation shows that the mutual information between the LSB at Alice and Bob is 0.1044 bits, that at the MSB level is only 0.035 bits. Hence, it is only by looking at the bits jointly that we are able to achieve a mutual information of 0.2217 bits which is 75.8% of maximum achievable. The implication is that only by coding both MSB and LSB jointly can we hope to get close to this 75.8% limit. Hence, non-binary codes are essential to achieve acceptable performance.

  11. Variable word length encoder reduces TV bandwith requirements

    NASA Technical Reports Server (NTRS)

    Sivertson, W. E., Jr.

    1965-01-01

    Adaptive variable resolution encoding technique provides an adaptive compression pseudo-random noise signal processor for reducing television bandwidth requirements. Complementary processors are required in both the transmitting and receiving systems. The pretransmission processor is analog-to-digital, while the postreception processor is digital-to-analog.

  12. Hey! A Tick Bit Me!

    MedlinePlus

    ... Staying Safe Videos for Educators Search English Español Hey! A Tick Bit Me! KidsHealth / For Kids / Hey! A Tick Bit Me! Print en español ¡Ay! ¡ ... tick collar. More on this topic for: Kids Hey! A Brown Recluse Spider Bit Me! Hey! A ...

  13. Core drill's bit is replaceable without withdrawal of drill stem - A concept

    NASA Technical Reports Server (NTRS)

    Rushing, F. C.; Simon, A. B.

    1970-01-01

    Drill bit is divided into several sectors. When collapsed, the outside diameter is forced down the drill stem, when it reaches bottom the sectors are forced outward and form a cutting bit. A dulled bit is retracted by reversal of this procedure.

  14. Lathe tool bit and holder for machining fiberglass materials

    NASA Technical Reports Server (NTRS)

    Winn, L. E. (Inventor)

    1972-01-01

    A lathe tool and holder combination for machining resin impregnated fiberglass cloth laminates is described. The tool holder and tool bit combination is designed to accommodate a conventional carbide-tipped, round shank router bit as the cutting medium, and provides an infinite number of cutting angles in order to produce a true and smooth surface in the fiberglass material workpiece with every pass of the tool bit. The technique utilizes damaged router bits which ordinarily would be discarded.

  15. High speed, real-time, camera bandwidth converter

    DOEpatents

    Bower, Dan E; Bloom, David A; Curry, James R

    2014-10-21

    Image data from a CMOS sensor with 10 bit resolution is reformatted in real time to allow the data to stream through communications equipment that is designed to transport data with 8 bit resolution. The incoming image data has 10 bit resolution. The communication equipment can transport image data with 8 bit resolution. Image data with 10 bit resolution is transmitted in real-time, without a frame delay, through the communication equipment by reformatting the image data.

  16. Method for compression of binary data

    DOEpatents

    Berlin, G.J.

    1996-03-26

    The disclosed method for compression of a series of data bytes, based on LZSS-based compression methods, provides faster decompression of the stored data. The method involves the creation of a flag bit buffer in a random access memory device for temporary storage of flag bits generated during normal LZSS-based compression. The flag bit buffer stores the flag bits separately from their corresponding pointers and uncompressed data bytes until all input data has been read. Then, the flag bits are appended to the compressed output stream of data. Decompression can be performed much faster because bit manipulation is only required when reading the flag bits and not when reading uncompressed data bytes and pointers. Uncompressed data is read using byte length instructions and pointers are read using word instructions, thus reducing the time required for decompression. 5 figs.

  17. Accelerating molecular dynamic simulation on the cell processor and Playstation 3.

    PubMed

    Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S

    2009-01-30

    Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.

  18. Allocating application to group of consecutive processors in fault-tolerant deadlock-free routing path defined by routers obeying same rules for path selection

    DOEpatents

    Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL

    2009-07-21

    In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.

  19. Communications systems and methods for subsea processors

    DOEpatents

    Gutierrez, Jose; Pereira, Luis

    2016-04-26

    A subsea processor may be located near the seabed of a drilling site and used to coordinate operations of underwater drilling components. The subsea processor may be enclosed in a single interchangeable unit that fits a receptor on an underwater drilling component, such as a blow-out preventer (BOP). The subsea processor may issue commands to control the BOP and receive measurements from sensors located throughout the BOP. A shared communications bus may interconnect the subsea processor and underwater components and the subsea processor and a surface or onshore network. The shared communications bus may be operated according to a time division multiple access (TDMA) scheme.

  20. An Efficient Functional Test Generation Method For Processors Using Genetic Algorithms

    NASA Astrophysics Data System (ADS)

    Hudec, Ján; Gramatová, Elena

    2015-07-01

    The paper presents a new functional test generation method for processors testing based on genetic algorithms and evolutionary strategies. The tests are generated over an instruction set architecture and a processor description. Such functional tests belong to the software-oriented testing. Quality of the tests is evaluated by code coverage of the processor description using simulation. The presented test generation method uses VHDL models of processors and the professional simulator ModelSim. The rules, parameters and fitness functions were defined for various genetic algorithms used in automatic test generation. Functionality and effectiveness were evaluated using the RISC type processor DP32.

Top