MIL-STD-1553B Marconi LSI chip set in a remote terminal application
NASA Astrophysics Data System (ADS)
Dimarino, A.
1982-11-01
Marconi Avionics is utilizing the MIL-STD-1553B LSI Chip Set in the SCADC Air Data Computer application to perform all of the required remote terminal MIL-STD-1553B protocol functions. Basic components of the RTU are the dual redundant chip set, CT3231 Transceivers, 256 x 16 RAM and a Z8002 microprocessor. Basic transfers are to/from the RAM command of the bus controller or Z8002 processor. During transfers from the processor to the RAM, the chip set busy bit is set for a period not exceeding 250 microseconds. When the transfer is complete, the busy bit is released and transfers to the data bus occur on command. The LSI Chip Set word count lines are used to locate each data word in the local memory and 4 mode codes are used in the application: reset remote terminal, transmit status word, transmitter shut-down, and override transmitter shutdown.
A floating-point/multiple-precision processor for airborne applications
NASA Technical Reports Server (NTRS)
Yee, R.
1982-01-01
A compact input output (I/O) numerical processor capable of performing floating-point, multiple precision and other arithmetic functions at execution times which are at least 100 times faster than comparable software emulation is described. The I/O device is a microcomputer system containing a 16 bit microprocessor, a numerical coprocessor with eight 80 bit registers running at a 5 MHz clock rate, 18K random access memory (RAM) and 16K electrically programmable read only memory (EPROM). The processor acts as an intelligent slave to the host computer and can be programmed in high order languages such as FORTRAN and PL/M-86.
A Wearable Healthcare System With a 13.7 μA Noise Tolerant ECG Processor.
Izumi, Shintaro; Yamashita, Ken; Nakano, Masanao; Kawaguchi, Hiroshi; Kimura, Hiromitsu; Marumoto, Kyoji; Fuchikami, Takaaki; Fujimori, Yoshikazu; Nakajima, Hiroshi; Shiga, Toshikazu; Yoshimoto, Masahiko
2015-10-01
To prevent lifestyle diseases, wearable bio-signal monitoring systems for daily life monitoring have attracted attention. Wearable systems have strict size and weight constraints, which impose significant limitations of the battery capacity and the signal-to-noise ratio of bio-signals. This report describes an electrocardiograph (ECG) processor for use with a wearable healthcare system. It comprises an analog front end, a 12-bit ADC, a robust Instantaneous Heart Rate (IHR) monitor, a 32-bit Cortex-M0 core, and 64 Kbyte Ferroelectric Random Access Memory (FeRAM). The IHR monitor uses a short-term autocorrelation (STAC) algorithm to improve the heart-rate detection accuracy despite its use in noisy conditions. The ECG processor chip consumes 13.7 μA for heart rate logging application.
NASA Astrophysics Data System (ADS)
Weigand, R.
Two new processor devices have been developed for the use on board of spacecrafts. An 8-bit 8032-microcontroller targets typical controlling applications in instruments and sub-systems, or could be used as a main processor on small satellites, whereas the LEON 32-bit SPARC processor can be used for high performance controlling and data processing tasks. The ADV80S32 is fully compliant to the Intel 80x1 architecture and instruction set, extended by additional peripherals, 512 bytes on-chip RAM and a bootstrap PROM, which allows downloading the application software using the CCSDS PacketWire pro- tocol. The memory controller provides a de-multiplexed address/data bus, and allows to access up to 16 MB data and 8 MB program RAM. The peripherals have been de- signed for the specific needs of a spacecraft, such as serial interfaces compatible to RS232, PacketWire and TTC-B-01, counters/timers for extended duration and a CRC calculation unit accelerating the CCSDS TM/TC protocol. The 0.5 um Atmel manu- facturing technology (MG2RT) provides latch-up and total dose immunity; SEU fault immunity is implemented by using SEU hardened Flip-Flops and EDAC protection of internal and external memories. The maximum clock frequency of 20 MHz allows a processing power of 3 MIPS. Engineering samples are available. For SW develop- ment, various SW packages for the 8051 architecture are on the market. The LEON processor implements a 32-bit SPARC V8 architecture, including all the multiply and divide instructions, complemented by a floating-point unit (FPU). It includes several standard peripherals, such as timers/watchdog, interrupt controller, UARTs, parallel I/Os and a memory controller, allowing to use 8, 16 and 32 bit PROM, SRAM or memory mapped I/O. With on-chip separate instruction and data caches, almost one instruction per clock cycle can be reached in some applications. A 33-MHz 32-bit PCI master/target interface and a PCI arbiter allow operating the device in a plug-in card (for SW development on PC etc.), or to consider using it as a PCI master controller in an on-board system. Advanced SEU fault tolerance is in- troduced by design, using triple modular redundancy (TMR) flip-flops for all registers and EDAC protection for all memories. The device will be manufactured in a radia- tion hard Atmel 0.25 um technology, targeting 100 MHz processor clock frequency. The non fault-tolerant LEON processor VHDL model is available as free source code, and the SPARC architecture is a well-known industry standard. Therefore, know-how, software tools and operating systems are widely available.
NASA Astrophysics Data System (ADS)
Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos
1990-03-01
The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
ACS Science Data Buffer Check/Self-Tests for CS Buffer RAM and MIE RAM
NASA Astrophysics Data System (ADS)
Balzano, V.
2001-07-01
The ACS Science Buffer RAM is checked for bit flips during SAA passages. This is followed by a Control Section {CS} self-test consisting of writing/reading a specified bit pattern from each memory location in Buffer RAM and a similar test for MIE RAM. The MIE must be placed in BOOT mode for its self-test. The CS Buffer RAM self-test as well as the bit flip tests are all done with the CS in Operate.
COS Side 2 Science Data Buffer Check/Self-Tests for CS Buffer RAM and DIB RAM
NASA Astrophysics Data System (ADS)
Bacinski, John
2013-10-01
The COS Science Buffer RAM is checked for bit flips during SAA passages. This is followed by a Control Section {CS} self-test consisting of writing/reading a specified bit pattern from each memory location in Buffer RAM and a similar test for DIB RAM. The DIB must be placed in BOOT mode for its self-test. The CS Buffer RAM self-test as well as the bit flip tests are all done with the CS in Operate.
ACS Science Data Buffer Check/Self-Tests for CS Buffer RAM and MIE RAM
NASA Astrophysics Data System (ADS)
Welty, Alan
2005-07-01
The ACS Science Buffer RAM is checked for bit flips during SAA passages. Thisis followed by a Control Section {CS} self-test consisting of writing/reading a specified bit pattern from each memory location in Buffer RAM and a similar test for MIE RAM. The MIE must be placed in BOOT mode for its self-test. The CS Buffer RAM self-test as well as the bit flip tests are all done with the CS in Operate.
SEM Analysis Techniques for LSI Microcircuits. Volume 2
1980-08-01
4~, 1 v’ ’ RADC-TR80-250, Vol 11 (of two), Final Technical -Report, Augut1980 SEM, ANALYSIS TECHNIQUES, FOR LSI MICROCIRCUITS: ’Martin...Bit Static ’RAM.. Volume II - 1024 Bit Stat’i RAM, 4096 Bit Dynamic RAM (SiGATE WOS,)., 4096 Bit -Dynamic RAM ( 1 2 L Bipolar)., ,Summary. RADC-TR-80-250...States, ithout.irst obtani an export nse, is a violation t Internatio 1 Tr ffic in A . eguiations. Such violation is subject o penalty of to 2 years impr
COS Science Data Buffer Check/Self-Tests for CS Buffer RAM and DIB RAM
NASA Astrophysics Data System (ADS)
Welty, Alan
2009-07-01
The COS Science Buffer RAM is checked for bit flips during SAA passages. This is followed by a Control Section {CS} self-test consisting of writing/reading a specified bit pattern from each memory location in Buffer RAM and a similar test for DIB RAM. The DIB must be placed in BOOT mode for its self-test. The CS Buffer RAM self-test as well as the bit flip tests are all done with the CS in Operate.Supports Activity COS-03
1990-06-01
RAM and ROM output enable signals. Figure C.7 shows the logic for the interrupt priority level (IPLO* through IPL2 *) and the interrupt acknowledge...IACK681* signal is sent to the DUART when a level one interrupt acknowledge is output by the CPU. The logic for the IACK681* and the IPLO* through IPL2 ...signals are actually implemented with an EPLD. Listing D.4 in Appendix D presents the Abel description of the IACK681* and IPLO* through IPL2
Research in the design of high-performance reconfigurable systems
NASA Technical Reports Server (NTRS)
Slotnick, D. L.; Mcewan, S. D.; Spry, A. J.
1984-01-01
An initial design for the Bit Processor (BP) referred to in prior reports as the Processing Element or PE has been completed. Eight BP's, together with their supporting random-access memory, a 64 k x 9 ROM to perform addition, routing logic, and some additional logic, constitute the components of a single stage. An initial stage design is given. Stages may be combined to perform high-speed fixed or floating point arithmetic. Stages can be configured into a range of arithmetic modules that includes bit-serial one or two-dimensional arrays; one or two dimensional arrays fixed or floating point processors; and specialized uniprocessors, such as long-word arithmetic units. One to eight BP's represent a likely initial chip level. The Stage would then correspond to a first-level pluggable module. As both this project and VLSI CAD/CAM progress, however, it is expected that the chip level would migrate upward to the stage and, perhaps, ultimately the box level. The BP RAM, consisting of two banks, holds only operands and indices. Programs are at the box (high-level function) and system level. At the system level initial effort has been concentrated on specifying the tools needed to evaluate design alternatives.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
NASA Astrophysics Data System (ADS)
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Flexible Peripheral Component Interconnect Input/Output Card
NASA Technical Reports Server (NTRS)
Bigelow, Kirk K.; Jerry, Albert L.; Baricio, Alisha G.; Cummings, Jon K.
2010-01-01
The Flexible Peripheral Component Interconnect (PCI) Input/Output (I/O) Card is an innovative circuit board that provides functionality to interface between a variety of devices. It supports user-defined interrupts for interface synchronization, tracks system faults and failures, and includes checksum and parity evaluation of interface data. The card supports up to 16 channels of high-speed, half-duplex, low-voltage digital signaling (LVDS) serial data, and can interface combinations of serial and parallel devices. Placement of a processor within the field programmable gate array (FPGA) controls an embedded application with links to host memory over its PCI bus. The FPGA also provides protocol stacking and quick digital signal processor (DSP) functions to improve host performance. Hardware timers, counters, state machines, and other glue logic support interface communications. The Flexible PCI I/O Card provides an interface for a variety of dissimilar computer systems, featuring direct memory access functionality. The card has the following attributes: 8/16/32-bit, 33-MHz PCI r2.2 compliance, Configurable for universal 3.3V/5V interface slots, PCI interface based on PLX Technology's PCI9056 ASIC, General-use 512K 16 SDRAM memory, General-use 1M 16 Flash memory, FPGA with 3K to 56K logical cells with embedded 27K to 198K bits RAM, I/O interface: 32-channel LVDS differential transceivers configured in eight, 4-bit banks; signaling rates to 200 MHz per channel, Common SCSI-3, 68-pin interface connector.
Method and apparatus for high speed data acquisition and processing
Ferron, J.R.
1997-02-11
A method and apparatus are disclosed for high speed digital data acquisition. The apparatus includes one or more multiplexers for receiving multiple channels of digital data at a low data rate and asserting a multiplexed data stream at a high data rate, and one or more FIFO memories for receiving data from the multiplexers and asserting the data to a real time processor. Preferably, the invention includes two multiplexers, two FIFO memories, and a 64-bit bus connecting the FIFO memories with the processor. Each multiplexer receives four channels of 14-bit digital data at a rate of up to 5 MHz per channel, and outputs a data stream to one of the FIFO memories at a rate of 20 MHz. The FIFO memories assert output data in parallel to the 64-bit bus, thus transferring 14-bit data values to the processor at a combined rate of 40 MHz. The real time processor is preferably a floating-point processor which processes 32-bit floating-point words. A set of mask bits is prestored in each 32-bit storage location of the processor memory into which a 14-bit data value is to be written. After data transfer from the FIFO memories, mask bits are concatenated with each stored 14-bit data value to define a valid 32-bit floating-point word. Preferably, a user can select any of several modes for starting and stopping direct memory transfers of data from the FIFO memories to memory within the real time processor, by setting the content of a control and status register. 15 figs.
Method and apparatus for high speed data acquisition and processing
Ferron, John R.
1997-01-01
A method and apparatus for high speed digital data acquisition. The apparatus includes one or more multiplexers for receiving multiple channels of digital data at a low data rate and asserting a multiplexed data stream at a high data rate, and one or more FIFO memories for receiving data from the multiplexers and asserting the data to a real time processor. Preferably, the invention includes two multiplexers, two FIFO memories, and a 64-bit bus connecting the FIFO memories with the processor. Each multiplexer receives four channels of 14-bit digital data at a rate of up to 5 MHz per channel, and outputs a data stream to one of the FIFO memories at a rate of 20 MHz. The FIFO memories assert output data in parallel to the 64-bit bus, thus transferring 14-bit data values to the processor at a combined rate of 40 MHz. The real time processor is preferably a floating-point processor which processes 32-bit floating-point words. A set of mask bits is prestored in each 32-bit storage location of the processor memory into which a 14-bit data value is to be written. After data transfer from the FIFO memories, mask bits are concatenated with each stored 14-bit data value to define a valid 32-bit floating-point word. Preferably, a user can select any of several modes for starting and stopping direct memory transfers of data from the FIFO memories to memory within the real time processor, by setting the content of a control and status register.
A 16K-bit static IIL RAM with 25-ns access time
NASA Astrophysics Data System (ADS)
Inabe, Y.; Hayashi, T.; Kawarada, K.; Miwa, H.; Ogiue, K.
1982-04-01
A 16,384 x 1-bit RAM with 25-ns access time, 600-mW power dissipation, and 33 sq mm chip size has been developed. Excellent speed-power performance with high packing density has been achieved by an oxide isolation technology in conjunction with novel ECL circuit techniques and IIL flip-flop memory cells, 980 sq microns (35 x 28 microns) in cell size. Development results have shown that IIL flip-flop memory cell is a trump card for assuring achievement of a high-performance large-capacity bipolar RAM, in the above 16K-bit/chip area.
FPGA-Based, Self-Checking, Fault-Tolerant Computers
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2004-01-01
A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
Learning the Art of Electronics
NASA Astrophysics Data System (ADS)
Hayes, Thomas C.; Horowitz, Paul
2016-03-01
1. DC circuits; 2. RC circuits; 3. Diode circuits; 4. Transistors I; 5. Transistors II; 6. Operational amplifiers I; 7. Operational amplifiers II: nice positive feedback; 8. Operational amplifiers III; 9. Operational amplifiers IV: nasty positive feedback; 10. Operational amplifiers V: PID motor control loop; 11. Voltage regulators; 12. MOSFET switches; 13. Group audio project; 14. Logic gates; 15. Logic compilers, sequential circuits, flip-flops; 16. Counters; 17. Memory: state machines; 18. Analog to digital: phase-locked loop; 19. Microcontrollers and microprocessors I: processor/controller; 20. I/O, first assembly language; 21. Bit operations; 22. Interrupt: ADC and DAC; 23. Moving pointers, serial buses; 24. Dallas Standalone Micro, SiLabs SPI RAM; 25. Toys in the attic; Appendices; Index.
Multi-wavelength access gate for WDM-formatted words in optical RAM row architectures
NASA Astrophysics Data System (ADS)
Fitsios, D.; Alexoudi, T.; Vagionas, C.; Miliou, A.; Kanellos, G. T.; Pleros, N.
2013-03-01
Optical RAM has emerged as a promising solution for overcoming the "Memory Wall" of electronics, indicating the use of light in RAM architectures as the approach towards enabling ps-regime memory access times. Taking a step further towards exploiting the unique wavelength properties of optical signals, we reveal new architectural perspectives in optical RAM structures by introducing WDM principles in the storage area. To this end, we demonstrate a novel SOAbased multi-wavelength Access Gate for utilization in a 4x4 WDM optical RAM bank architecture. The proposed multiwavelength Access Gate can simultaneously control random access to a 4-bit optical word, exploiting Cross-Gain-Modulation (XGM) to process 8 Bit and Bit channels encoded in 8 different wavelengths. It also suggests simpler optical RAM row architectures, allowing for the effective sharing of one multi-wavelength Access Gate for each row, substituting the eight AGs in the case of conventional optical RAM architectures. The scheme is shown to support 10Gbit/s operation for the incoming 4-bit data streams, with a power consumption of 15mW/Gbit/s. All 8 wavelength channels demonstrate error-free operation with a power penalty lower than 3 dB for all channels, compared to Back-to-Back measurements. The proposed optical RAM architecture reveals that exploiting the WDM capabilities of optical components can lead to RAM bank implementations with smarter column/row encoders/decoders, increased circuit simplicity, reduced number of active elements and associated power consumption. Moreover, exploitation of the wavelength entity can release significant potential towards reconfigurable optical cache mapping schemes when using the wavelength dimension for memory addressing.
Pierce, Paul E.
1986-01-01
A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Pierce, P.E.
A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Accelerating scientific computations with mixed precision algorithms
NASA Astrophysics Data System (ADS)
Baboulin, Marc; Buttari, Alfredo; Dongarra, Jack; Kurzak, Jakub; Langou, Julie; Langou, Julien; Luszczek, Piotr; Tomov, Stanimire
2009-12-01
On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. The approach presented here can apply not only to conventional processors but also to other technologies such as Field Programmable Gate Arrays (FPGA), Graphical Processing Units (GPU), and the STI Cell BE processor. Results on modern processor architectures and the STI Cell BE are presented. Program summaryProgram title: ITER-REF Catalogue identifier: AECO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECO_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7211 No. of bytes in distributed program, including test data, etc.: 41 862 Distribution format: tar.gz Programming language: FORTRAN 77 Computer: desktop, server Operating system: Unix/Linux RAM: 512 Mbytes Classification: 4.8 External routines: BLAS (optional) Nature of problem: On modern architectures, the performance of 32-bit operations is often at least twice as fast as the performance of 64-bit operations. By using a combination of 32-bit and 64-bit floating point arithmetic, the performance of many dense and sparse linear algebra algorithms can be significantly enhanced while maintaining the 64-bit accuracy of the resulting solution. Solution method: Mixed precision algorithms stem from the observation that, in many cases, a single precision solution of a problem can be refined to the point where double precision accuracy is achieved. A common approach to the solution of linear systems, either dense or sparse, is to perform the LU factorization of the coefficient matrix using Gaussian elimination. First, the coefficient matrix A is factored into the product of a lower triangular matrix L and an upper triangular matrix U. Partial row pivoting is in general used to improve numerical stability resulting in a factorization PA=LU, where P is a permutation matrix. The solution for the system is achieved by first solving Ly=Pb (forward substitution) and then solving Ux=y (backward substitution). Due to round-off errors, the computed solution, x, carries a numerical error magnified by the condition number of the coefficient matrix A. In order to improve the computed solution, an iterative process can be applied, which produces a correction to the computed solution at each iteration, which then yields the method that is commonly known as the iterative refinement algorithm. Provided that the system is not too ill-conditioned, the algorithm produces a solution correct to the working precision. Running time: seconds/minutes
Method and apparatus for pulse width modulation control of an AC induction motor
Geppert, Steven; Slicker, James M.
1984-01-01
An inverter is connected between a source of DC power and a three-phase AC induction motor, and a micro-processor-based circuit controls the inverter using pulse width modulation techniques. In the disclosed method of pulse width modulation, both edges of each pulse of a carrier pulse train are equally modulated by a time proportional to sin .THETA., where .THETA. is the angular displacement of the pulse center at the motor stator frequency from a fixed reference point on the carrier waveform. The carrier waveform frequency is a multiple of the motor stator frequency. The modulated pulse train is then applied to each of the motor phase inputs with respective phase shifts of 120.degree. at the stator frequency. Switching control commands of electronic switches in the inverter are stored in a random access memory (RAM) and the locations of the RAM are successively read out in a cyclic manner, each bit of a given RAM location controlling a respective phase input of the motor. The DC power source preferably comprises rechargeable batteries and all but one of the electronic switches in the inverter can be disabled, the remaining electronic switch being part of a "flyback" DC-DC converter circuit for recharging the battery.
Method and apparatus for pulse width modulation control of an AC induction motor
NASA Technical Reports Server (NTRS)
Geppert, Steven (Inventor); Slicker, James M. (Inventor)
1984-01-01
An inverter is connected between a source of DC power and a three-phase AC induction motor, and a micro-processor-based circuit controls the inverter using pulse width modulation techniques. In the disclosed method of pulse width modulation, both edges of each pulse of a carrier pulse train are equally modulated by a time proportional to sin .THETA., where .THETA. is the angular displacement of the pulse center at the motor stator frequency from a fixed reference point on the carrier waveform. The carrier waveform frequency is a multiple of the motor stator frequency. The modulated pulse train is then applied to each of the motor phase inputs with respective phase shifts of 120.degree. at the stator frequency. Switching control commands of electronic switches in the inverter are stored in a random access memory (RAM) and the locations of the RAM are successively read out in a cyclic manner, each bit of a given RAM location controlling a respective phase input of the motor. The DC power source preferably comprises rechargeable batteries and all but one of the electronic switches in the inverter can be disabled, the remaining electronic switch being part of a flyback DC-DC converter circuit for recharging the battery.
Reliability evaluation of CMOS RAMs
NASA Astrophysics Data System (ADS)
Salvo, C. J.; Sasaki, A. T.
The results of an evaluation of the reliability of a 1K x 1 bit CMOS RAM and a 4K x 1 bit CMOS RAM for the USAF are reported. The tests consisted of temperature cycling, thermal shock, electrical overstress-static discharge and accelerated life test cells. The study indicates that the devices have high reliability potential for military applications. Use-temperature failure rates at 100 C were 0.54 x 10 to the -5th failures/hour for the 1K RAM and 0.21 x 10 to the -5th failures/hour for the 4K RAM. Only minimal electrostatic discharge damage was noted in the devices when they were subjected to multiple pulses at 1000 Vdc, and redesign of the 7 Vdc quiescent parameter of the 4K RAM is expected to raise its field threshold voltage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
Digital high speed programmable convolver
NASA Astrophysics Data System (ADS)
Rearick, T. C.
1984-12-01
A circuit module for rapidly calculating a discrete numerical convolution is described. A convolution such as finding the sum of the products of a 16 bit constant and a 16 bit variable is performed by a module which is programmable so that the constant may be changed for a new problem. In addition, the module may be programmed to find the sum of the products of 4 and 8 bit constants and variables. RAM (Random Access Memories) are loaded with partial products of the selected constant and all possible variables. Then, when the actual variable is loaded, it acts as an address to find the correct partial product in the particular RAM. The partial products from all of the RAMs are shifted to the appropriate numerical power position (if necessary) and then added in adder elements.
Single-Event Effect Response of a Commercial ReRAM
NASA Technical Reports Server (NTRS)
Chen, Dakai; Label, Kenneth A.; Kim, Hak; Phan, Anthony; Wilcox, Edward; Buchner, Stephen; Khachatrian, Ani; Roche, Nicolas
2014-01-01
We show heavy ion test results of a commercial production-level ReRAM. The memory array is robust to bit upsets. However the ReRAM system is vulnerable to SEFIs due to upsets in peripheral circuits, including the sense amplifier.
Digital Hardware Architecture Implementation
1993-02-15
of micro - MOTOROLA 63.7 50MHZ 64 BIT 2092 N/A processors during quarterly re- INTEL 42 50MHz 64 BIT 1092 N/A views and monthly reports. The 186o XP...27 3.2.1 Signal Processor (SP) Analysis...31 3.2.1.11 MasPar Software Statements ........................................................ 32 3.2.2 Data Processor
1991-07-31
memory banks Up to 1.25MByte SRAM 5 planes of 2048 x 1024 pixels Programmable video parameters max 720 x 512 pixels Sixteen colors TTL RGBI standard...bit I/O extension bus (VLXbus) Up to 2048 KByte 0-wait state static RAM BTT (Built-In-Test) PAL selectable dual ported VMEbus address Two RS-232/422...16, 25, or 33 MHz) A16/24:D08/16 VMEbus interface 8/16-bit I/O Extension bus (VLXbus) Up to 2048 KByte 32-bit wide static RAM -- 0-wait state at 16
On-orbit observations of single event upset in Harris HM-6508 1K RAMs, reissue A
NASA Astrophysics Data System (ADS)
Blake, J. B.; Mandel, R.
1987-02-01
The Harris HM-6508 1K x 1 RAMs are part of a subsystem of a satellite in a low, polar orbit. The memory module, used in the subsystem containing the RAMs, consists of three printed circuit cards, with each card containing eight 2K byte memory hybrids, for a total of 48K bytes. Each memory hybrid contains 16 HM-6508 RAM chips. On a regular basis all but 256 bytes of the 48K bytes are examined for bit errors. Two different techniques were used for detecting bit errors. The first technique, a memory check sum, was capable of automatically detecting all single bit and some double bit errors which occurred within a page of memory. A memory page consists of 256 bytes. Memory check sum tests are performed approximately every 90 minutes. To detect a multiple error or to determine the exact location of the bit error within the page the entire contents of the memory is dumped and compared to the load file. Memory dumps are normally performed once a month, or immediately after the check sum routine detects an error. Once the exact location of the error is found, the correct value is reloaded into memory. After the memory is reloaded, the contents of the memory location in question is verified in order to determine if the error was a soft error generated by an SEU or a hard error generated by a part failure or cosmic-ray induced latchup.
Error analysis and prevention of cosmic ion-induced soft errors in static CMOS RAMs
NASA Astrophysics Data System (ADS)
Diehl, S. E.; Ochoa, A., Jr.; Dressendorfer, P. V.; Koga, P.; Kolasinski, W. A.
1982-12-01
Cosmic ray interactions with memory cells are known to cause temporary, random, bit errors in some designs. The sensitivity of polysilicon gate CMOS static RAM designs to logic upset by impinging ions has been studied using computer simulations and experimental heavy ion bombardment. Results of the simulations are confirmed by experimental upset cross-section data. Analytical models have been extended to determine and evaluate design modifications which reduce memory cell sensitivity to cosmic ions. A simple design modification, the addition of decoupling resistance in the feedback path, is shown to produce static RAMs immune to cosmic ray-induced bit errors.
32-Bit-Wide Memory Tolerates Failures
NASA Technical Reports Server (NTRS)
Buskirk, Glenn A.
1990-01-01
Electronic memory system of 32-bit words corrects bit errors caused by some common type of failures - even failure of entire 4-bit-wide random-access-memory (RAM) chip. Detects failure of two such chips, so user warned that ouput of memory may contain errors. Includes eight 4-bit-wide DRAM's configured so each bit of each DRAM assigned to different one of four parallel 8-bit words. Each DRAM contributes only 1 bit to each 8-bit word.
NASA Astrophysics Data System (ADS)
Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.
2014-03-01
The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
A high-accuracy optical linear algebra processor for finite element applications
NASA Technical Reports Server (NTRS)
Casasent, D.; Taylor, B. K.
1984-01-01
Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
A high-speed digital signal processor for atmospheric radar, part 7.3A
NASA Technical Reports Server (NTRS)
Brosnahan, J. W.; Woodard, D. M.
1984-01-01
The Model SP-320 device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter. The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. Presently available is an I/O module conforming to the Intel Multichannel interface standard; other I/O modules will be designed to meet specific user requirements. The main processor board includes input and output FIFO (First In First Out) memories, both with depths of 4096 W, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MW/s.
Design, processing, and testing of lsi arrays for space station
NASA Technical Reports Server (NTRS)
Lile, W. R.; Hollingsworth, R. J.
1972-01-01
The design of a MOS 256-bit Random Access Memory (RAM) is discussed. Technological achievements comprise computer simulations that accurately predict performance; aluminum-gate COS/MOS devices including a 256-bit RAM with current sensing; and a silicon-gate process that is being used in the construction of a 256-bit RAM with voltage sensing. The Si-gate process increases speed by reducing the overlap capacitance between gate and source-drain, thus reducing the crossover capacitance and allowing shorter interconnections. The design of a Si-gate RAM, which is pin-for-pin compatible with an RCA bulk silicon COS/MOS memory (type TA 5974), is discussed in full. The Integrated Circuit Tester (ICT) is limited to dc evaluation, but the diagnostics and data collecting are under computer control. The Silicon-on-Sapphire Memory Evaluator (SOS-ME, previously called SOS Memory Exerciser) measures power supply drain and performs a minimum number of tests to establish operation of the memory devices. The Macrodata MD-100 is a microprogrammable tester which has capabilities of extensive testing at speeds up to 5 MHz. Beam-lead technology was successfully integrated with SOS technology to make a simple device with beam leads. This device and the scribing are discussed.
DFT algorithms for bit-serial GaAs array processor architectures
NASA Technical Reports Server (NTRS)
Mcmillan, Gary B.
1988-01-01
Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
Stroboscope Controller for Imaging Helicopter Rotors
NASA Technical Reports Server (NTRS)
Jensen, Scott; Marmie, John; Mai, Nghia
2004-01-01
A versatile electronic timing-and-control unit, denoted a rotorcraft strobe controller, has been developed for use in controlling stroboscopes, lasers, video cameras, and other instruments for capturing still images of rotating machine parts especially helicopter rotors. This unit is designed to be compatible with a variety of sources of input shaftangle or timing signals and to be capable of generating a variety of output signals suitable for triggering instruments characterized by different input-signal specifications. It is also designed to be flexible and reconfigurable in that it can be modified and updated through changes in its control software, without need to change its hardware. Figure 1 is a block diagram of the rotorcraft strobe controller. The control processor is a high-density complementary metal oxide semiconductor, singlechip 8-bit microcontroller. It is connected to a 32K x 8 nonvolatile static random-access memory (RAM) module. Also connected to the control processor is a 32K 8 electrically programmable read-only-memory (EPROM) module, which is used to store the control software. Digital logic support circuitry is implemented in a field-programmable gate array (FPGA). A 240 x 128-dot, 40- character 16-line liquid-crystal display (LCD) module serves as a graphical user interface; the user provides input through a 16-key keypad mounted next to the LCD. A 12-bit digital-to-analog converter (DAC) generates a 0-to-10-V ramp output signal used as part of a rotor-blade monitoring system, while the control processor generates all the appropriate strobing signals. Optocouplers are used to isolate all input and output digital signals, and optoisolators are used to isolate all analog signals. The unit is designed to fit inside a 19-in. (.48-cm) rack-mount enclosure. Electronic components are mounted on a custom printed-circuit board (see Figure 2). Two power-conversion modules on the printedcircuit board convert AC power to +5 VDC and 15 VDC, respectively.
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Testing and operating a multiprocessor chip with processor redundancy
Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J
2014-10-21
A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
Fault-tolerant corrector/detector chip for high-speed data processing
Andaleon, David D.; Napolitano, Jr., Leonard M.; Redinbo, G. Robert; Shreeve, William O.
1994-01-01
An internally fault-tolerant data error detection and correction integrated circuit device (10) and a method of operating same. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum is provided with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented.
Fault-tolerant corrector/detector chip for high-speed data processing
Andaleon, D.D.; Napolitano, L.M. Jr.; Redinbo, G.R.; Shreeve, W.O.
1994-03-01
An internally fault-tolerant data error detection and correction integrated circuit device and a method of operating same is described. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented. 8 figures.
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
Architectural Techniques For Managing Non-volatile Caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
As chip power dissipation becomes a critical challenge in scaling processor performance, computer architects are forced to fundamentally rethink the design of modern processors and hence, the chip-design industry is now at a major inflection point in its hardware roadmap. The high leakage power and low density of SRAM poses serious obstacles in its use for designing large on-chip caches and for this reason, researchers are exploring non-volatile memory (NVM) devices, such as spin torque transfer RAM, phase change RAM and resistive RAM. However, since NVMs are not strictly superior to SRAM, effective architectural techniques are required for making themmore » a universal memory solution. This book discusses techniques for designing processor caches using NVM devices. It presents algorithms and architectures for improving their energy efficiency, performance and lifetime. It also provides both qualitative and quantitative evaluation to help the reader gain insights and motivate them to explore further. This book will be highly useful for beginners as well as veterans in computer architecture, chip designers, product managers and technical marketing professionals.« less
Reconfigurable data path processor
NASA Technical Reports Server (NTRS)
Donohoe, Gregory (Inventor)
2005-01-01
A reconfigurable data path processor comprises a plurality of independent processing elements. Each of the processing elements advantageously comprising an identical architecture. Each processing element comprises a plurality of data processing means for generating a potential output. Each processor is also capable of through-putting an input as a potential output with little or no processing. Each processing element comprises a conditional multiplexer having a first conditional multiplexer input, a second conditional multiplexer input and a conditional multiplexer output. A first potential output value is transmitted to the first conditional multiplexer input, and a second potential output value is transmitted to the second conditional multiplexer output. The conditional multiplexer couples either the first conditional multiplexer input or the second conditional multiplexer input to the conditional multiplexer output, according to an output control command. The output control command is generated by processing a set of arithmetic status-bits through a logical mask. The conditional multiplexer output is coupled to a first processing element output. A first set of arithmetic bits are generated according to the processing of the first processable value. A second set of arithmetic bits may be generated from a second processing operation. The selection of the arithmetic status-bits is performed by an arithmetic-status bit multiplexer selects the desired set of arithmetic status bits from among the first and second set of arithmetic status bits. The conditional multiplexer evaluates the select arithmetic status bits according to logical mask defining an algorithm for evaluating the arithmetic status bits.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
A fully integrated mixed-signal neural processor for implantable multichannel cortical recording.
Sodagar, Amir M; Wise, Kensall D; Najafi, Khalil
2007-06-01
A 64-channel neural processor has been developed for use in an implantable neural recording microsystem. In the Scan Mode, the processor is capable of detecting neural spikes by programmable positive, negative, or window thresholding. Spikes are tagged with their associated channel addresses and formed into 18-bit data words that are sent serially to the external host. In the Monitor Mode, two channels can be selected and viewed at high resolution for studies where the entire signal is of interest. The processor runs from a 3-V supply and a 2-MHz clock, with a channel scan rate of 64 kS/s and an output bit rate of 2 Mbps.
NASA Astrophysics Data System (ADS)
Haron, Adib; Mahdzair, Fazren; Luqman, Anas; Osman, Nazmie; Junid, Syed Abdul Mutalib Al
2018-03-01
One of the most significant constraints of Von Neumann architecture is the limited bandwidth between memory and processor. The cost to move data back and forth between memory and processor is considerably higher than the computation in the processor itself. This architecture significantly impacts the Big Data and data-intensive application such as DNA analysis comparison which spend most of the processing time to move data. Recently, the in-memory processing concept was proposed, which is based on the capability to perform the logic operation on the physical memory structure using a crossbar topology and non-volatile resistive-switching memristor technology. This paper proposes a scheme to map digital equality comparator circuit on memristive memory crossbar array. The 2-bit, 4-bit, 8-bit, 16-bit, 32-bit, and 64-bit of equality comparator circuit are mapped on memristive memory crossbar array by using material implication logic in a sequential and parallel method. The simulation results show that, for the 64-bit word size, the parallel mapping exhibits 2.8× better performance in total execution time than sequential mapping but has a trade-off in terms of energy consumption and area utilization. Meanwhile, the total crossbar area can be reduced by 1.2× for sequential mapping and 1.5× for parallel mapping both by using the overlapping technique.
High Resolution, High Precision I-Line Stepper Processing
NASA Astrophysics Data System (ADS)
Yanazawa, H.; Hasegawa, N.; Kurosaki, T.; Hashimoto, N.; Nonogaki, S.
1985-06-01
Currently, the integrated MOS dynamic RAM has as many as 256 thousand memory cells per chip based on 2 pm photolithography. Figure 1 shows the history and the prospects for progress in microfabrication technology. Feature size versus year, as reported by Bossung in 1978, is shown, as developed from independent analysis by Moore, Noyce and Gnostic concept. Circles numbered 1 and 2 show that 64K- and 256K-bit RAMs were developed in 1981 and 1984, and that their feature sizes were 3μm and 2μm, respectively. It is significant that the predictions and the real developments are so close. Furthermore, since the basic process for 3 M-bit RAMs based on 1.3μm microlithography has already been reported in conference, it is highly likely that they will become commercially available around 1987, as predicted by the circle numbered 3 based on 1.3μm microlithography.
Digital signal processor and processing method for GPS receivers
NASA Technical Reports Server (NTRS)
Thomas, Jr., Jess B. (Inventor)
1989-01-01
A digital signal processor and processing method therefor for use in receivers of the NAVSTAR/GLOBAL POSITIONING SYSTEM (GPS) employs a digital carrier down-converter, digital code correlator and digital tracking processor. The digital carrier down-converter and code correlator consists of an all-digital, minimum bit implementation that utilizes digital chip and phase advancers, providing exceptional control and accuracy in feedback phase and in feedback delay. Roundoff and commensurability errors can be reduced to extremely small values (e.g., less than 100 nanochips and 100 nanocycles roundoff errors and 0.1 millichip and 1 millicycle commensurability errors). The digital tracking processor bases the fast feedback for phase and for group delay in the C/A, P.sub.1, and P.sub.2 channels on the L.sub.1 C/A carrier phase thereby maintaining lock at lower signal-to-noise ratios, reducing errors in feedback delays, reducing the frequency of cycle slips and in some cases obviating the need for quadrature processing in the P channels. Simple and reliable methods are employed for data bit synchronization, data bit removal and cycle counting. Improved precision in averaged output delay values is provided by carrier-aided data-compression techniques. The signal processor employs purely digital operations in the sense that exactly the same carrier phase and group delay measurements are obtained, to the last decimal place, every time the same sampled data (i.e., exactly the same bits) are processed.
Plant, Richard R; Turner, Garry
2009-08-01
Since the publication of Plant, Hammond, and Turner (2004), which highlighted a pressing need for researchers to pay more attention to sources of error in computer-based experiments, the landscape has undoubtedly changed, but not necessarily for the better. Readily available hardware has improved in terms of raw speed; multi core processors abound; graphics cards now have hundreds of megabytes of RAM; main memory is measured in gigabytes; drive space is measured in terabytes; ever larger thin film transistor displays capable of single-digit response times, together with newer Digital Light Processing multimedia projectors, enable much greater graphic complexity; and new 64-bit operating systems, such as Microsoft Vista, are now commonplace. However, have millisecond-accurate presentation and response timing improved, and will they ever be available in commodity computers and peripherals? In the present article, we used a Black Box ToolKit to measure the variability in timing characteristics of hardware used commonly in psychological research.
A Concurrent Smalltalk Compiler for the Message-Driven Processor
1988-05-01
apj with bits from low-bit (inclusive) to high-bit (exclusive) set. ;;;Low-bit defaults to zero. (defmacro brange (high-bit &optional low-bit) (list...n2) (null (cddr num))) (aetg bits (b+ bits (if (>- nl n2) ( brange (1+ nl) n2) ( brange (1+ n2) ni)))) (error "Bad bmap range: -S" flu.)))) (t (error...vlocs) flat ((vlive (b- finst-vllv* mast) *I.( brange firat-context-slot-nun))) (next (inst-next last))) (if (bempty vlive) (delete-module module inat
a Real-Time Computer Music Synthesis System
NASA Astrophysics Data System (ADS)
Lent, Keith Henry
A real time sound synthesis system has been developed at the Computer Music Center of The University of Texas at Austin. This system consists of several stand alone processors that were constructed jointly with White Instruments in Austin. These processors can be programmed as general purpose computers, but are provided with a number of specialized interfaces including: MIDI, 8 bit parallel, high speed serial, 2 channels analog input (18 bit A/Ds, 48kHz sample rate), and 4 channels analog output (18 bit D/As). In addition, a basic music synthesis language (Music56000) has been written in assembly code. On top of this, a symbolic compiler (PatchWork) has been developed to enable algorithms which run in these processors to be created graphically. And finally, a number of efficient time domain numerical models have been developed to enable the construction, simulation, control, and synthesis of many musical acoustics systems in real time on these processors. Specifically, assembly language models for cylindrical and conical horn sections, dissipative losses, tone holes, bells, and a number of linear and nonlinear boundary conditions have been developed.
Technology transfer of military space microprocessor developments
NASA Astrophysics Data System (ADS)
Gorden, C.; King, D.; Byington, L.; Lanza, D.
1999-01-01
Over the past 13 years the Air Force Research Laboratory (AFRL) has led the development of microprocessors and computers for USAF space and strategic missile applications. As a result of these Air Force development programs, advanced computer technology is available for use by civil and commercial space customers as well. The Generic VHSIC Spaceborne Computer (GVSC) program began in 1985 at AFRL to fulfill a deficiency in the availability of space-qualified data and control processors. GVSC developed a radiation hardened multi-chip version of the 16-bit, Mil-Std 1750A microprocessor. The follow-on to GVSC, the Advanced Spaceborne Computer Module (ASCM) program, was initiated by AFRL to establish two industrial sources for complete, radiation-hardened 16-bit and 32-bit computers and microelectronic components. Development of the Control Processor Module (CPM), the first of two ASCM contract phases, concluded in 1994 with the availability of two sources for space-qualified, 16-bit Mil-Std-1750A computers, cards, multi-chip modules, and integrated circuits. The second phase of the program, the Advanced Technology Insertion Module (ATIM), was completed in December 1997. ATIM developed two single board computers based on 32-bit reduced instruction set computer (RISC) processors. GVSC, CPM, and ATIM technologies are flying or baselined into the majority of today's DoD, NASA, and commercial satellite systems.
INM. Integrated Noise Model Version 4.11. User’s Guide - Supplement
1993-12-01
KB of Random Access Memory (RAM) or 3 MB of RAM, if operating the INM from a RAM disk, as discussed in Section 1.2.1 below; 0 Math co-processor, Series... accessible from the Data Base using the ACDB11.EXE computer program, supplied with the Version 4.11 release. With the exception of INM airplane numbers 1, 6...9214 10760 -- -.-- 27 7053 6215 9470 10703 --- --- - 28 SS7 5940 SS94 729S . ... ... 29 4223 4884 7897 9214 10760 ..... 30 sots 6474 7939 8774
Novel Robotic Tools for Piping Inspection and Repair
2015-01-14
was selected due to its small size, and peripheral capability. The SoM measures 50mm x 44mm. The SoM processor is an ARM Cortex -A8 running at720MHz...designing an embedded computing system from scratch. The SoM is a single integrated module which contains the processor , RAM, power management, and
Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J
2004-09-01
We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Testability Design Rating System: Testability Handbook. Volume 1
1992-02-01
4-10 4.7.5 Summary of False BIT Alarms (FBA) ............................. 4-10 4.7.6 Smart BIT Technique...Circuit Board PGA Pin Grid Array PLA Programmable Logic Array PLD Programmable Logic Device PN Pseudo-Random Number PREDICT Probabilistic Estimation of...11 4.7.6 Smart BIT ( reference: RADC-TR-85-198). " Smart " BIT is a term given to BIT circuitry in a system LRU which includes dedicated processor/memory
Data flow machine for data driven computing
Davidson, G.S.; Grafe, V.G.
1988-07-22
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information from an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ''fire'' signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Direct match data flow memory for data driven computing
Davidson, George S.; Grafe, Victor Gerald
1997-01-01
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow memory for data driven computing
Davidson, G.S.; Grafe, V.G.
1997-10-07
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status bit to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Free-Electron Laser Driven by the NBS (National Bureau of Standards) CW Microtron
1988-03-31
planned over several years. This will begin with the purchase of a 32-bit dual processor system for the yet to be constructed primary station wire scanner ...display subsystem. This 32-bit dual processor system will not only form the wire scanner display system, but has sufficient processing power to...7th hit. Coiif. on FELs, eds., E.T. Scharlemann and D. Prosnitz (North- Holland, Amsterdam, 1986) p. 278. 121 X.K Maruyania and S. Penner, C.M. Tang
Digital MOS integrated circuits
NASA Astrophysics Data System (ADS)
Elmasry, M. I.
MOS in digital circuit design is considered along with aspects of digital VLSI, taking into account a comparison of MOSFET logic circuits, 1-micrometer MOSFET VLSI technology, a generalized guide for MOSFET miniaturization, processing technologies, novel circuit structures for VLSI, and questions of circuit and system design for VLSI. MOS memory cells and circuits are discussed, giving attention to a survey of high-density dynamic RAM cell concepts, one-device cells for dynamic random-access memories, variable resistance polysilicon for high density CMOS Ram, high performance MOS EPROMs using a stacked-gate cell, and the optimization of the latching pulse for dynamic flip-flop sensors. Programmable logic arrays are considered along with digital signal processors, microprocessors, static RAMs, and dynamic RAMs.
Diversification of Processors Based on Redundancy in Instruction Set
NASA Astrophysics Data System (ADS)
Ichikawa, Shuichi; Sawada, Takashi; Hata, Hisashi
By diversifying processor architecture, computer software is expected to be more resistant to plagiarism, analysis, and attacks. This study presents a new method to diversify instruction set architecture (ISA) by utilizing the redundancy in the instruction set. Our method is particularly suited for embedded systems implemented with FPGA technology, and realizes a genuine instruction set randomization, which has not been provided by the preceding studies. The evaluation results on four typical ISAs indicate that our scheme can provide a far larger degree of freedom than the preceding studies. Diversified processors based on MIPS architecture were actually implemented and evaluated with Xilinx Spartan-3 FPGA. The increase of logic scale was modest: 5.1% in Specialized design and 3.6% in RAM-mapped design. The performance overhead was also modest: 3.4% in Specialized design and 11.6% in RAM-mapped design. From these results, our scheme is regarded as a practical and promising way to secure FPGA-based embedded systems.
System and method for programmable bank selection for banked memory subsystems
Blumrich, Matthias A.; Chen, Dong; Gara, Alan G.; Giampapa, Mark E.; Hoenicke, Dirk; Ohmacht, Martin; Salapura, Valentina; Sugavanam, Krishnan
2010-09-07
A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.
A Simple and Affordable TTL Processor for the Classroom
ERIC Educational Resources Information Center
Feinberg, Dave
2007-01-01
This paper presents a simple 4 bit computer processor design that may be built using TTL chips for less than $65. In addition to describing the processor itself in detail, we discuss our experience using the laboratory kit and its associated machine instruction set to teach computer architecture to high school students. (Contains 3 figures and 5…
Software Method for Computed Tomography Cylinder Data Unwrapping, Re-slicing, and Analysis
NASA Technical Reports Server (NTRS)
Roth, Don J.
2013-01-01
A software method has been developed that is applicable for analyzing cylindrical and partially cylindrical objects inspected using computed tomography (CT). This method involves unwrapping and re-slicing data so that the CT data from the cylindrical object can be viewed as a series of 2D sheets (or flattened onion skins ) in addition to a series of top view slices and 3D volume rendering. The advantages of viewing the data in this fashion are as follows: (1) the use of standard and specialized image processing and analysis methods is facilitated having 2D array data versus a volume rendering; (2) accurate lateral dimensional analysis of flaws is possible in the unwrapped sheets versus volume rendering; (3) flaws in the part jump out at the inspector with the proper contrast expansion settings in the unwrapped sheets; and (4) it is much easier for the inspector to locate flaws in the unwrapped sheets versus top view slices for very thin cylinders. The method is fully automated and requires no input from the user except proper voxel dimension from the CT experiment and wall thickness of the part. The software is available in 32-bit and 64-bit versions, and can be used with binary data (8- and 16-bit) and BMP type CT image sets. The software has memory (RAM) and hard-drive based modes. The advantage of the (64-bit) RAM-based mode is speed (and is very practical for users of 64-bit Windows operating systems and computers having 16 GB or more RAM). The advantage of the hard-drive based analysis is one can work with essentially unlimited-sized data sets. Separate windows are spawned for the unwrapped/re-sliced data view and any image processing interactive capability. Individual unwrapped images and un -wrapped image series can be saved in common image formats. More information is available at http://www.grc.nasa.gov/WWW/OptInstr/ NDE_CT_CylinderUnwrapper.html.
Rectangular Array Of Digital Processors For Planning Paths
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.
1993-01-01
Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
NASA Technical Reports Server (NTRS)
Grumet, A.
1981-01-01
An automatic correlation plane processor that can rapidly acquire, identify, and locate the autocorrelation outputs of a bank of multiple optical matched filters is described. The read-only memory (ROM) stored digital silhouette of each image associated with each matched filter allows TV video to be used to collect image energy to provide accurate normalization of autocorrelations. The resulting normalized autocorrelations are independent of the illumination of the matched input. Deviation from unity of a normalized correlation can be used as a confidence measure of correct image identification. Analog preprocessing circuits permit digital conversion and random access memory (RAM) storage of those video signals with the correct amplitude, pulse width, rising slope, and falling slope. TV synchronized addressing of 3 RAMs permits on-line storage of: (1) the maximum unnormalized amplitude, (2) the image x location, and (3) the image y location of the output of each of up to 99 matched filters. A fourth RAM stores all normalized correlations. A normalization approach, normalization for cross correlations, a system's description with block diagrams, and system's applications are discussed.
Burst-mode optical label processor with ultralow power consumption.
Ibrahim, Salah; Nakahara, Tatsushi; Ishikawa, Hiroshi; Takahashi, Ryo
2016-04-04
A novel label processor subsystem for 100-Gbps (25-Gbps × 4λs) burst-mode optical packets is developed, in which a highly energy-efficient method is pursued for extracting and interfacing the ultrafast packet-label to a CMOS-based processor where label recognition takes place. The method involves performing serial-to-parallel conversion for the label bits on a bit-by-bit basis by using an optoelectronic converter that is operated with a set of optical triggers generated in a burst-mode manner upon packet arrival. Here we present three key achievements that enabled a significant reduction in the total power consumption and latency of the whole subsystem; 1) based on a novel operation mechanism for providing amplification with bit-level selectivity, an optical trigger pulse generator, that consumes power for a very short duration upon packet arrival, is proposed and experimentally demonstrated, 2) the energy of optical triggers needed by the optoelectronic serial-to-parallel converter is reduced by utilizing a negative-polarity signal while employing an enhanced conversion scheme entitled the discharge-or-hold scheme, 3) the necessary optical trigger energy is further cut down by half by coupling the triggers through the chip's backside, whereas a novel lens-free packaging method is developed to enable a low-cost alignment process that works with simple visual observation.
Large-N in Volcano Settings: Volcanosri
NASA Astrophysics Data System (ADS)
Lees, J. M.; Song, W.; Xing, G.; Vick, S.; Phillips, D.
2014-12-01
We seek a paradigm shift in the approach we take on volcano monitoring where the compromise from high fidelity to large numbers of sensors is used to increase coverage and resolution. Accessibility, danger and the risk of equipment loss requires that we develop systems that are independent and inexpensive. Furthermore, rather than simply record data on hard disk for later analysis we desire a system that will work autonomously, capitalizing on wireless technology and in field network analysis. To this end we are currently producing a low cost seismic array which will incorporate, at the very basic level, seismological tools for first cut analysis of a volcano in crises mode. At the advanced end we expect to perform tomographic inversions in the network in near real time. Geophone (4 Hz) sensors connected to a low cost recording system will be installed on an active volcano where triggering earthquake location and velocity analysis will take place independent of human interaction. Stations are designed to be inexpensive and possibly disposable. In one of the first implementations the seismic nodes consist of an Arduino Due processor board with an attached Seismic Shield. The Arduino Due processor board contains an Atmel SAM3X8E ARM Cortex-M3 CPU. This 32 bit 84 MHz processor can filter and perform coarse seismic event detection on a 1600 sample signal in fewer than 200 milliseconds. The Seismic Shield contains a GPS module, 900 MHz high power mesh network radio, SD card, seismic amplifier, and 24 bit ADC. External sensors can be attached to either this 24-bit ADC or to the internal multichannel 12 bit ADC contained on the Arduino Due processor board. This allows the node to support attachment of multiple sensors. By utilizing a high-speed 32 bit processor complex signal processing tasks can be performed simultaneously on multiple sensors. Using a 10 W solar panel, second system being developed can run autonomously and collect data on 3 channels at 100Hz for 6 months with the installed 16Gb SD card. Initial designs and test results will be presented and discussed.
A scalable SIMD digital signal processor for high-quality multifunctional printer systems
NASA Astrophysics Data System (ADS)
Kang, Hyeong-Ju; Choi, Yongwoo; Kim, Kimo; Park, In-Cheol; Kim, Jung-Wook; Lee, Eul-Hwan; Gahang, Goo-Soo
2005-01-01
This paper describes a high-performance scalable SIMD digital signal processor (DSP) developed for multifunctional printer systems. The DSP supports a variable number of datapaths to cover a wide range of performance and maintain a RISC-like pipeline structure. Many special instructions suitable for image processing algorithms are included in the DSP. Quad/dual instructions are introduced for 8-bit or 16-bit data, and bit-field extraction/insertion instructions are supported to process various data types. Conditional instructions are supported to deal with complex relative conditions efficiently. In addition, an intelligent DMA block is integrated to align data in the course of data reading. Experimental results show that the proposed DSP outperforms a high-end printer-system DSP by at least two times.
Assessing Server Fault Tolerance and Disaster Recovery Implementation in Thin Client Architectures
2007-09-01
server • Windows 2003 server Processor AMD Geode GX Memory 512MB Flash/256MB DDR RAM I/O/Peripheral Support • VGA-type video output (DB-15...2000 Advanced Server Processor AMD Geode NX 1500 Memory • 256MB or 512MB or 1GB DDR SDRAM • 1GB or 512MB Flash I/O/Peripheral Support • SiS741 GX
Tactical Operations Analysis Support Facility.
1983-07-01
are stored in nonvolatile RAM (NVR). Communication with a host processor via a UART (75-19.2K bps) in full duplex mode. An advanced video option...hardware/firmware "machines." Smart terminals, I/O con- * trollers, and unique peripheral processors are examples of this process. Briton Lee, Inc...the relational data base for symbol attributes and data retrievals. * Generates a grid system for precise cursor positioning for lines, charts, and
Distributed Micro-Processor Applications to Guidance and Control Systems.
1982-07-01
nanoseconds compared with 22 milliseconds for the older type of NMOS non-volatile RAM. This non-volatile RAM is estimated to hold its memory for 100 years...illustrated in figure 1.4.3.3 and compared with the traditional permalog chevron bubble structure. The contiguous element bubble structure is being developed ...M for its 8086 based Digital Advanced Avionics System (DAAS) developed for NASA Ames, but rejected it as being unsuitable. Ada is the new DoD
Microprocessor design for GaAs technology
NASA Astrophysics Data System (ADS)
Milutinovic, Veljko M.
Recent advances in the design of GaAs microprocessor chips are examined in chapters contributed by leading experts; the work is intended as reading material for a graduate engineering course or as a practical R&D reference. Topics addressed include the methodology used for the architecture, organization, and design of GaAs processors; GaAs device physics and circuit design; design concepts for microprocessor-based GaAs systems; a 32-bit GaAs microprocessor; a 32-bit processor implemented in GaAs JFET; and a direct coupled-FET-logic E/D-MESFET experimental RISC machine. Drawings, micrographs, and extensive circuit diagrams are provided.
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Video image processor on the Spacelab 2 Solar Optical Universal Polarimeter /SL2 SOUP/
NASA Technical Reports Server (NTRS)
Lindgren, R. W.; Tarbell, T. D.
1981-01-01
The SOUP instrument is designed to obtain diffraction-limited digital images of the sun with high photometric accuracy. The Video Processor originated from the requirement to provide onboard real-time image processing, both to reduce the telemetry rate and to provide meaningful video displays of scientific data to the payload crew. This original concept has evolved into a versatile digital processing system with a multitude of other uses in the SOUP program. The central element in the Video Processor design is a 16-bit central processing unit based on 2900 family bipolar bit-slice devices. All arithmetic, logical and I/O operations are under control of microprograms, stored in programmable read-only memory and initiated by commands from the LSI-11. Several functions of the Video Processor are described, including interface to the High Rate Multiplexer downlink, cosmetic and scientific data processing, scan conversion for crew displays, focus and exposure testing, and use as ground support equipment.
Rapid Damage Assessment. Volume II. Development and Testing of Rapid Damage Assessment System.
1981-02-01
pixels/s Camera Line Rate 732.4 lines/s Pixels per Line 1728 video 314 blank 4 line number (binary) 2 run number (BCD) 2048 total Pixel Resolution 8 bits...sists of an LSI-ll microprocessor, a VDI -200 video display processor, an FD-2 dual floppy diskette subsystem, an FT-I function key-trackball module...COMPONENT LIST FOR IMAGE PROCESSOR SYSTEM IMAGE PROCESSOR SYSTEM VIEWS I VDI -200 Display Processor Racks, Table FD-2 Dual Floppy Diskette Subsystem FT-l
Experience in highly parallel processing using DAP
NASA Technical Reports Server (NTRS)
Parkinson, D.
1987-01-01
Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Batista, Antonio J. N.; Santos, Bruno; Fernandes, Ana
The data acquisition and control instrumentation cubicles room of the ITER tokamak will be irradiated with neutrons during the fusion reactor operation. A Virtex-6 FPGA from Xilinx (XC6VLX365T-1FFG1156C) is used on the ATCA-IO-PROCESSOR board, included in the ITER Catalog of I and C products - Fast Controllers. The Virtex-6 is a re-programmable logic device where the configuration is stored in Static RAM (SRAM), functional data stored in dedicated Block RAM (BRAM) and functional state logic in Flip-Flops. Single Event Upsets (SEU) due to the ionizing radiation of neutrons causes soft errors, unintended changes (bit-flips) to the values stored in statemore » elements of the FPGA. The SEU monitoring and soft errors repairing, when possible, were explored in this work. An FPGA built-in Soft Error Mitigation (SEM) controller detects and corrects soft errors in the FPGA configuration memory. Novel SEU sensors with Error Correction Code (ECC) detect and repair the BRAM memories. Proper management of SEU can increase reliability and availability of control instrumentation hardware for nuclear applications. The results of the tests performed using the SEM controller and the BRAM SEU sensors are presented for a Virtex-6 FPGA (XC6VLX240T-1FFG1156C) when irradiated with neutrons from the Portuguese Research Reactor (RPI), a 1 MW nuclear fission reactor operated by IST in the neighborhood of Lisbon. Results show that the proposed SEU mitigation technique is able to repair the majority of the detected SEU errors in the configuration and BRAM memories. (authors)« less
Analog Nonvolatile Computer Memory Circuits
NASA Technical Reports Server (NTRS)
MacLeod, Todd
2007-01-01
In nonvolatile random-access memory (RAM) circuits of a proposed type, digital data would be stored in analog form in ferroelectric field-effect transistors (FFETs). This type of memory circuit would offer advantages over prior volatile and nonvolatile types: In a conventional complementary metal oxide/semiconductor static RAM, six transistors must be used to store one bit, and storage is volatile in that data are lost when power is turned off. In a conventional dynamic RAM, three transistors must be used to store one bit, and the stored bit must be refreshed every few milliseconds. In contrast, in a RAM according to the proposal, data would be retained when power was turned off, each memory cell would contain only two FFETs, and the cell could store multiple bits (the exact number of bits depending on the specific design). Conventional flash memory circuits afford nonvolatile storage, but they operate at reading and writing times of the order of thousands of conventional computer memory reading and writing times and, hence, are suitable for use only as off-line storage devices. In addition, flash memories cease to function after limited numbers of writing cycles. The proposed memory circuits would not be subject to either of these limitations. Prior developmental nonvolatile ferroelectric memories are limited to one bit per cell, whereas, as stated above, the proposed memories would not be so limited. The design of a memory circuit according to the proposal must reflect the fact that FFET storage is only partly nonvolatile, in that the signal stored in an FFET decays gradually over time. (Retention times of some advanced FFETs exceed ten years.) Instead of storing a single bit of data as either a positively or negatively saturated state in a ferroelectric device, each memory cell according to the proposal would store two values. The two FFETs in each cell would be denoted the storage FFET and the control FFET. The storage FFET would store an analog signal value, between the positive and negative FFET saturation values. This signal value would represent a numerical value of interest corresponding to multiple bits: for example, if the memory circuit were designed to distinguish among 16 different analog values, then each cell could store 4 bits. Simultaneously with writing the signal value in the storage FFET, a negative saturation signal value would be stored in the control FFET. The decay of this control-FFET signal from the saturation value would serve as a model of the decay, for use in regenerating the numerical value of interest from its decaying analog signal value. The memory circuit would include addressing, reading, and writing circuitry that would have features in common with the corresponding parts of other memory circuits, but would also have several distinctive features. The writing circuitry would include a digital-to-analog converter (DAC); the reading circuitry would include an analog-to-digital converter (ADC). For writing a numerical value of interest in a given cell, that cell would be addressed, the saturation value would be written in the control FFET in that cell, and the non-saturation analog value representing the numerical value of interest would be generated by use of the DAC and stored in the storage FFET in that cell. For reading the numerical value of interest stored in a given cell, the cell would be addressed, the ADC would convert the decaying control and storage analog signal values to digital values, and an associated fast digital processing circuit would regenerate the numerical value from digital values.
Is random access memory random?
NASA Technical Reports Server (NTRS)
Denning, P. J.
1986-01-01
Most software is contructed on the assumption that the programs and data are stored in random access memory (RAM). Physical limitations on the relative speeds of processor and memory elements lead to a variety of memory organizations that match processor addressing rate with memory service rate. These include interleaved and cached memory. A very high fraction of a processor's address requests can be satified from the cache without reference to the main memory. The cache requests information from main memory in blocks that can be transferred at the full memory speed. Programmers who organize algorithms for locality can realize the highest performance from these computers.
Design and Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor
2015-03-10
for Public Release; Distribution Unlimited Final Report: Design and Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Superconductor technology, RSFQ, RQL, processor design, arithmetic units, high-performance...Demonstration of a 30 GHz 16-bit Superconductor RSFQ Microprocessor Report Title The major objective of the project was to design and demonstrate operation
Architecture of a Message-Driven Processor,
1987-11-01
Jon Kaplan, Paul Song, Brian Totty, and Scott Wills Artifcial Intelligence Laboratory -4 Laboratory for Computer Science Massachusetts Institute of...Information Dally, Chao, Chien, Hassoun, Horwat, Kaplan, Song, Totty & Wills: Artificial Intelligence i Laboratory and Laboratory for Computer Science, MIT...applied to a problem if we could are 36 bits long (32 data bits + 4 tag bits) and are used to hold efficiently run programs with a granularity of 5s
Application of NASA General-Purpose Solver to Large-Scale Computations in Aeroacoustics
NASA Technical Reports Server (NTRS)
Watson, Willie R.; Storaasli, Olaf O.
2004-01-01
Of several iterative and direct equation solvers evaluated previously for computations in aeroacoustics, the most promising was the NASA-developed General-Purpose Solver (winner of NASA's 1999 software of the year award). This paper presents detailed, single-processor statistics of the performance of this solver, which has been tailored and optimized for large-scale aeroacoustic computations. The statistics, compiled using an SGI ORIGIN 2000 computer with 12 Gb available memory (RAM) and eight available processors, are the central processing unit time, RAM requirements, and solution error. The equation solver is capable of solving 10 thousand complex unknowns in as little as 0.01 sec using 0.02 Gb RAM, and 8.4 million complex unknowns in slightly less than 3 hours using all 12 Gb. This latter solution is the largest aeroacoustics problem solved to date with this technique. The study was unable to detect any noticeable error in the solution, since noise levels predicted from these solution vectors are in excellent agreement with the noise levels computed from the exact solution. The equation solver provides a means for obtaining numerical solutions to aeroacoustics problems in three dimensions.
Programming scheme based optimization of hybrid 4T-2R OxRAM NVSRAM
NASA Astrophysics Data System (ADS)
Majumdar, Swatilekha; Kingra, Sandeep Kaur; Suri, Manan
2017-09-01
In this paper, we present a novel single-cycle programming scheme for 4T-2R NVSRAM, exploiting pulse engineered input signals. OxRAM devices based on 3 nm thick bi-layer active switching oxide and 90 nm CMOS technology node were used for all simulations. The cell design is implemented for real-time non-volatility rather than last-bit, or power-down non-volatility. Detailed analysis of the proposed single-cycle, parallel RRAM device programming scheme is presented in comparison to the two-cycle sequential RRAM programming used for similar 4T-2R NVSRAM bit-cells. The proposed single-cycle programming scheme coupled with the 4T-2R architecture leads to several benefits such as- possibility of unconventional transistor sizing, 50% lower latency, 20% improvement in SNM and ∼20× reduced energy requirements, when compared against two-cycle programming approach.
NbN A/D Conversion of IR Focal Plane Sensor Signal at 10 K
NASA Technical Reports Server (NTRS)
Eaton, L.; Durand, D.; Sandell, R.; Spargo, J.; Krabach, T.
1994-01-01
We are implementing a 12 bit SFQ counting ADC with parallel-to-serial readout using our established 10 K NbN capability. This circuit provides a key element of the analog signal processor (ASP) used in large infrared focal plane arrays. The circuit processes the signal data stream from a Si:As BIB detector array. A 10 mega samples per second (MSPS) pixel data stream flows from the chip at a 120 megabit bit rate in a format that is compatible with other superconductive time dependent processor (TDP) circuits being developed. We will discuss our planned ASP demonstration, the circuit design, and test results.
A 50Mbit/Sec. CMOS Video Linestore System
NASA Astrophysics Data System (ADS)
Jeung, Yeun C.
1988-10-01
This paper reports the architecture, design and test results of a CMOS single chip programmable video linestore system which has 16-bit data words with 1024 bit depth. The delay is fully programmable from 9 to 1033 samples by a 10 bit binary control word. The large 16 bit data word width makes the chip useful for a wide variety of digital video signal processing applications such as DPCM coding, High-Definition TV, and Video scramblers/descramblers etc. For those applications, the conventional large fixed-length shift register or static RAM scheme is not very popular because of its lack of versatility, high power consumption, and required support circuitry. The very high throughput of 50Mbit/sec is made possible by a highly parallel, pipelined dynamic memory architecture implemented in a 2-um N-well CMOS technology. The basic cell of the programmable video linestore chip is an four transistor dynamic RAM element. This cell comprises the majority of the chip's real estate, consumes no static power, and gives good noise immunity to the simply designed sense amplifier. The chip design was done using Bellcore's version of the MULGA virtual grid symbolic layout system. The chip contains approximately 90,000 transistors in an area of 6.5 x 7.5 square mm and the I/Os are TTL compatible. The chip is packaged in a 68-pin leadless ceramic chip carrier package.
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate
NASA Astrophysics Data System (ADS)
Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.
2004-10-01
Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
Tactical Operations Analysis Support Facility.
1981-05-01
Punch/Reader 2 DMC-11AR DDCMP Micro Processor 2 DMC-11DA Network Link Line Unit 2 DL-11E Async Serial Line Interface 4 Intel IN-1670 448K Words MOS Memory...86 5.3 VIRTUAL PROCESSORS - VAX-11/750 ........................... 89 5.4 A RELATIONAL DATA MANAGEMENT SYSTEM - ORACLE...Central Processing Unit (CPU) is a 16 bit processor for high-speed, real time applications, and for large multi-user, multi- task, time shared
Sparse distributed memory: Principles and operation
NASA Technical Reports Server (NTRS)
Flynn, M. J.; Kanerva, P.; Bhadkamkar, N.
1989-01-01
Sparse distributed memory is a generalized random access memory (RAM) for long (1000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech recognition and scene analysis, in signal detection and verification, and in adaptive control of automated equipment, in general, in dealing with real world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. Major design issues were resolved which were faced in building the memories. The design is described of a prototype memory with 256 bit addresses and from 8 to 128 K locations for 256 bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components.
A Future Accelerated Cognitive Distributed Hybrid Testbed for Big Data Science Analytics
NASA Astrophysics Data System (ADS)
Halem, M.; Prathapan, S.; Golpayegani, N.; Huang, Y.; Blattner, T.; Dorband, J. E.
2016-12-01
As increased sensor spectral data volumes from current and future Earth Observing satellites are assimilated into high-resolution climate models, intensive cognitive machine learning technologies are needed to data mine, extract and intercompare model outputs. It is clear today that the next generation of computers and storage, beyond petascale cluster architectures, will be data centric. They will manage data movement and process data in place. Future cluster nodes have been announced that integrate multiple CPUs with high-speed links to GPUs and MICS on their backplanes with massive non-volatile RAM and access to active flash RAM disk storage. Active Ethernet connected key value store disk storage drives with 10Ge or higher are now available through the Kinetic Open Storage Alliance. At the UMBC Center for Hybrid Multicore Productivity Research, a future state-of-the-art Accelerated Cognitive Computer System (ACCS) for Big Data science is being integrated into the current IBM iDataplex computational system `bluewave'. Based on the next gen IBM 200 PF Sierra processor, an interim two node IBM Power S822 testbed is being integrated with dual Power 8 processors with 10 cores, 1TB Ram, a PCIe to a K80 GPU and an FPGA Coherent Accelerated Processor Interface card to 20TB Flash Ram. This system is to be updated to the Power 8+, an NVlink 1.0 with the Pascal GPU late in 2016. Moreover, the Seagate 96TB Kinetic Disk system with 24 Ethernet connected active disks is integrated into the ACCS storage system. A Lightweight Virtual File System developed at the NASA GSFC is installed on bluewave. Since remote access to publicly available quantum annealing computers is available at several govt labs, the ACCS will offer an in-line Restricted Boltzmann Machine optimization capability to the D-Wave 2X quantum annealing processor over the campus high speed 100 Gb network to Internet 2 for large files. As an evaluation test of the cognitive functionality of the architecture, the following studies utilizing all the system components will be presented; (i) a near real time climate change study generating CO2 fluxes and (ii) a deep dive capability into an 8000 x8000 pixel image pyramid display and (iii) Large dense and sparse eigenvalue decomposition.
Systolic array IC for genetic computation
NASA Technical Reports Server (NTRS)
Anderson, D.
1991-01-01
Measuring similarities between large sequences of genetic information is a formidable task requiring enormous amounts of computer time. Geneticists claim that nearly two months of CRAY-2 time are required to run a single comparison of the known database against the new bases that will be found this year, and more than a CRAY-2 year for next year's genetic discoveries, and so on. The DNA IC, designed at HP-ICBD in cooperation with the California Institute of Technology and the Jet Propulsion Laboratory, is being implemented in order to move the task of genetic comparison onto workstations and personal computers, while vastly improving performance. The chip is a systolic (pumped) array comprised of 16 processors, control logic, and global RAM, totaling 400,000 FETS. At 12 MHz, each chip performs 2.7 billion 16 bit operations per second. Using 35 of these chips in series on one PC board (performing nearly 100 billion operations per second), a sequence of 560 bases can be compared against the eventual total genome of 3 billion bases, in minutes--on a personal computer. While the designed purpose of the DNA chip is for genetic research, other disciplines requiring similarity measurements between strings of 7 bit encoded data could make use of this chip as well. Cryptography and speech recognition are two examples. A mix of full custom design and standard cells, in CMOS34, were used to achieve these goals. Innovative test methods were developed to enhance controllability and observability in the array. This paper describes these techniques as well as the chip's functionality. This chip was designed in the 1989-90 timeframe.
Associative architecture for image processing
NASA Astrophysics Data System (ADS)
Adar, Rutie; Akerib, Avidan
1997-09-01
This article presents a new generation in parallel processing architecture for real-time image processing. The approach is implemented in a real time image processor chip, called the XiumTM-2, based on combining a fully associative array which provides the parallel engine with a serial RISC core on the same die. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and media processing functions in real time. The associative part of the chip is based on patented pending methodology of Associative Computing Ltd. (ACL), which condenses 2048 associative processors, each of 128 'intelligent' bits. Each bit can be a processing bit or a memory bit. At only 33 MHz and 0.6 micron manufacturing technology process, the chip has a computational power of 3 billion ALU operations per second and 66 billion string search operations per second. The fully programmable nature of the XiumTM-2 chip enables developers to use ACL tools to write their own proprietary algorithms combined with existing image processing and analysis functions from ACL's extended set of libraries.
An FPGA computing demo core for space charge simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Jinyuan; Huang, Yifei; /Fermilab
2009-01-01
In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
NASA Astrophysics Data System (ADS)
Wang, Yongjun; Liu, Xinyu; Tian, Qinghua; Wang, Lina; Xin, Xiangjun
2018-03-01
Basic configurations of various all-optical clocked flip-flops (FFs) and optical random access memory (RAM) based on the nonlinear polarization rotation (NPR) effect of low-polarization-dependent semiconductor optical amplifiers (SOA) are proposed. As the constituent elements, all-optical logic gates and all-optical SR latches are constructed by taking advantage of the SOA's NPR switch. Different all-optical FFs (AOFFs), including SR-, D-, T-, and JK-types as well as an optical RAM cell were obtained by the combination of the proposed all-optical SR latches and logic gates. The effectiveness of the proposed schemes were verified by simulation results and demonstrated by a D-FF and 1-bit RAM cell experimental system. The proposed all-optical clocked FFs and RAM cell are significant to all-optical signal processing.
Bit-serial neuroprocessor architecture
NASA Technical Reports Server (NTRS)
Tawel, Raoul (Inventor)
2001-01-01
A neuroprocessor architecture employs a combination of bit-serial and serial-parallel techniques for implementing the neurons of the neuroprocessor. The neuroprocessor architecture includes a neural module containing a pool of neurons, a global controller, a sigmoid activation ROM look-up-table, a plurality of neuron state registers, and a synaptic weight RAM. The neuroprocessor reduces the number of neurons required to perform the task by time multiplexing groups of neurons from a fixed pool of neurons to achieve the successive hidden layers of a recurrent network topology.
The Single Event Upset (SEU) response to 590 MeV protons
NASA Technical Reports Server (NTRS)
Nichols, D. K.; Price, W. E.; Smith, L. S.; Soli, G. A.
1984-01-01
The presence of high-energy protons in cosmic rays, solar flares, and trapped radiation belts around Jupiter poses a threat to the Galileo project. Results of a test of 10 device types (including 1K RAM, 4-bit microP sequencer, 4-bit slice, 9-bit data register, 4-bit shift register, octal flip-flop, and 4-bit counter) exposed to 590 MeV protons at the Swiss Institute of Nuclear Research are presented to clarify the picture of SEU response to the high-energy proton environment of Jupiter. It is concluded that the data obtained should remove the concern that nuclear reaction products generated by protons external to the device can cause significant alteration in the device SEU response. The data also show only modest increases in SEU cross section as proton energies are increased up to the upper limits of energy for both the terrestrial and Jovian trapped proton belts.
A sparse matrix algorithm on the Boolean vector machine
NASA Technical Reports Server (NTRS)
Wagner, Robert A.; Patrick, Merrell L.
1988-01-01
VLSI technology is being used to implement a prototype Boolean Vector Machine (BVM), which is a large network of very small processors with equally small memories that operate in SIMD mode; these use bit-serial arithmetic, and communicate via cube-connected cycles network. The BVM's bit-serial arithmetic and the small memories of individual processors are noted to compromise the system's effectiveness in large numerical problem applications. Attention is presently given to the implementation of a basic matrix-vector iteration algorithm for space matrices of the BVM, in order to generate over 1 billion useful floating-point operations/sec for this iteration algorithm. The algorithm is expressed in a novel language designated 'BVM'.
A microprocessor based high speed packet switch for satellite communications
NASA Technical Reports Server (NTRS)
Arozullah, M.; Crist, S. C.
1980-01-01
The architectures of a single processor, a three processor, and a multiple processor system are described. The hardware circuits, and software routines required for implementing the three and multiple processor designs are presented. A bit-slice microprocessor was designed and microprogrammed. Maximum throughput was calculated for all three designs. Queue theoretic models for these three designs were developed and utilized to obtain analytical expressions for the average waiting times, overall average response times and average queue sizes. From these expressions, graphs were obtained showing the effect on the system performance of a number of design parameters.
Cognitive Medical Wireless Testbed System (COMWITS)
2016-11-01
Number: ...... ...... Sub Contractors (DD882) Names of other research staff Inventions (DD882) Scientific Progress This testbed merges two ARO grants...bit 64 bit CPU Intel Xeon Processor E5-1650v3 (6C, 3.5 GHz, Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon
ERIC Educational Resources Information Center
Perez, Ernest
1997-01-01
Examines the practical realities of upgrading Intel personal computers in libraries, considering budgets and technical personnel availability. Highlights include adding RAM; putting in faster processor chips, including clock multipliers; new hard disks; CD-ROM speed; motherboards and interface cards; cost limits and economic factors; and…
Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP
NASA Astrophysics Data System (ADS)
Brooks, Geoffrey W.
1996-03-01
Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.
Control mechanism of double-rotator-structure ternary optical computer
NASA Astrophysics Data System (ADS)
Kai, SONG; Liping, YAN
2017-03-01
Double-rotator-structure ternary optical processor (DRSTOP) has two characteristics, namely, giant data-bits parallel computing and reconfigurable processor, which can handle thousands of data bits in parallel, and can run much faster than computers and other optical computer systems so far. In order to put DRSTOP into practical application, this paper established a series of methods, namely, task classification method, data-bits allocation method, control information generation method, control information formatting and sending method, and decoded results obtaining method and so on. These methods form the control mechanism of DRSTOP. This control mechanism makes DRSTOP become an automated computing platform. Compared with the traditional calculation tools, DRSTOP computing platform can ease the contradiction between high energy consumption and big data computing due to greatly reducing the cost of communications and I/O. Finally, the paper designed a set of experiments for DRSTOP control mechanism to verify its feasibility and correctness. Experimental results showed that the control mechanism is correct, feasible and efficient.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Szabo, Levente; Koniorczyk, Matyas; Adam, Peter
We consider the entanglement manipulation capabilities of the universal covariant quantum cloner or quantum processor circuit for quantum bits. We investigate its use for cloning a member of a bipartite or a genuine tripartite entangled state of quantum bits. We find that for bipartite pure entangled states a nontrivial behavior of concurrence appears, while for GHZ entangled states a possibility of the partial extraction of bipartite entanglement can be achieved.
Fundamental physics issues of multilevel logic in developing a parallel processor.
NASA Astrophysics Data System (ADS)
Bandyopadhyay, Anirban; Miki, Kazushi
2007-06-01
In the last century, On and Off physical switches, were equated with two decisions 0 and 1 to express every information in terms of binary digits and physically realize it in terms of switches connected in a circuit. Apart from memory-density increase significantly, more possible choices in particular space enables pattern-logic a reality, and manipulation of pattern would allow controlling logic, generating a new kind of processor. Neumann's computer is based on sequential logic, processing bits one by one. But as pattern-logic is generated on a surface, viewing whole pattern at a time is a truly parallel processing. Following Neumann's and Shannons fundamental thermodynamical approaches we have built compatible model based on series of single molecule based multibit logic systems of 4-12 bits in an UHV-STM. On their monolayer multilevel communication and pattern formation is experimentally verified. Furthermore, the developed intelligent monolayer is trained by Artificial Neural Network. Therefore fundamental weak interactions for the building of truly parallel processor are explored here physically and theoretically.
Direct match data flow machine apparatus and process for data driven computing
Davidson, G.S.; Grafe, V.G.
1997-08-12
A data flow computer and method of computing are disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a ``fire`` signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor. 11 figs.
Data flow machine for data driven computing
Davidson, George S.; Grafe, Victor G.
1995-01-01
A data flow computer which of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Direct match data flow machine apparatus and process for data driven computing
Davidson, George S.; Grafe, Victor Gerald
1997-01-01
A data flow computer and method of computing is disclosed which utilizes a data driven processor node architecture. The apparatus in a preferred embodiment includes a plurality of First-In-First-Out (FIFO) registers, a plurality of related data flow memories, and a processor. The processor makes the necessary calculations and includes a control unit to generate signals to enable the appropriate FIFO register receiving the result. In a particular embodiment, there are three FIFO registers per node: an input FIFO register to receive input information form an outside source and provide it to the data flow memories; an output FIFO register to provide output information from the processor to an outside recipient; and an internal FIFO register to provide information from the processor back to the data flow memories. The data flow memories are comprised of four commonly addressed memories. A parameter memory holds the A and B parameters used in the calculations; an opcode memory holds the instruction; a target memory holds the output address; and a tag memory contains status bits for each parameter. One status bit indicates whether the corresponding parameter is in the parameter memory and one status but to indicate whether the stored information in the corresponding data parameter is to be reused. The tag memory outputs a "fire" signal (signal R VALID) when all of the necessary information has been stored in the data flow memories, and thus when the instruction is ready to be fired to the processor.
Reconfigurable pipelined processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saccardi, R.J.
1989-09-19
This patent describes a reconfigurable pipelined processor for processing data. It comprises: a plurality of memory devices for storing bits of data; a plurality of arithmetic units for performing arithmetic functions with the data; cross bar means for connecting the memory devices with the arithmetic units for transferring data therebetween; at least one counter connected with the cross bar means for providing a source of addresses to the memory devices; at least one variable tick delay device connected with each of the memory devices and arithmetic units; and means for providing control bits to the variable tick delay device formore » variably controlling the input and output operations thereof to selectively delay the memory devices and arithmetic units to align the data for processing in a selected sequence.« less
Programmable architecture for pixel level processing tasks in lightweight strapdown IR seekers
NASA Astrophysics Data System (ADS)
Coates, James L.
1993-06-01
Typical processing tasks associated with missile IR seeker applications are described, and a straw man suite of algorithms is presented. A fully programmable multiprocessor architecture is realized on a multimedia video processor (MVP) developed by Texas Instruments. The MVP combines the elements of RISC, floating point, advanced DSPs, graphics processors, display and acquisition control, RAM, and external memory. Front end pixel level tasks typical of missile interceptor applications, operating on 256 x 256 sensor imagery, can be processed at frame rates exceeding 100 Hz in a single MVP chip.
Giovannetti, Vittorio; Lloyd, Seth; Maccone, Lorenzo
2008-04-25
A random access memory (RAM) uses n bits to randomly address N=2(n) distinct memory cells. A quantum random access memory (QRAM) uses n qubits to address any quantum superposition of N memory cells. We present an architecture that exponentially reduces the requirements for a memory call: O(logN) switches need be thrown instead of the N used in conventional (classical or quantum) RAM designs. This yields a more robust QRAM algorithm, as it in general requires entanglement among exponentially less gates, and leads to an exponential decrease in the power needed for addressing. A quantum optical implementation is presented.
Enhancement of Speed Margins for 16× Digital Versatile Disc-Random Access Memory
NASA Astrophysics Data System (ADS)
Watanabe, Koichi; Minemura, Hiroyuki; Miyamoto, Makoto; Iimura, Makoto
2006-02-01
We have evaluated the speed margins of write/read 16× digital versatile disc-random access memory (DVD-RAM) test discs using write strategies for 6--16× constant angular velocity (CAV) control. Our approach is to determine the writing parameters for the middle zones by interpolating the zone numbers. Using this interpolation strategy, we successfully obtained overwrite jitter values of less than 8% and bit error rates of less than 10-5 in 6--16× DVD-RAM. Moreover, we confirmed that the speed margins were ± 20% for a 6--16× CAV.
Method and system for selecting data sampling phase for self timed interface logic
Hoke, Joseph Michael; Ferraiolo, Frank D.; Lo, Tin-Chee; Yarolin, John Michael
2005-01-04
An exemplary embodiment of the present invention is a method for transmitting data among processors over a plurality of parallel data lines and a clock signal line. A receiver processor receives both data and a clock signal from a sender processor. At the receiver processor a bit of the data is phased aligned with the transmitted clock signal. The phase aligning includes selecting a data phase from a plurality of data phases in a delay chain and then adjusting the selected data phase to compensate for a round-off error. Additional embodiments include a system and storage medium for transmitting data among processors over a plurality of parallel data lines and a clock signal line.
Cross-Layer Resilience Exploration
2015-03-31
complex 563 server-class systems) and any arbitrary fault model (permanent, transient, multi-bit, etc.) System Design Analysis Using flip- flop ...level fault injection, we rank the vulnerability of each flip- flop in the processor in terms of its likelihood to propagate faults [3]. This allows the...hardened flip- flops , which are flip- flops designed to uphold the bit representation of their output circuit even under particle strikes [1, 6, 10
Searching for New Double Stars with a Computer
NASA Astrophysics Data System (ADS)
Bryant, T. V.
2015-04-01
The advent of computers with large amounts of RAM memory and fast processors, as well as easy internet access to large online astronomical databases, has made computer searches based on astrometric data practicable for most researchers. This paper describes one such search that has uncovered hitherto unrecognized double stars.
Concurrent error detecting codes for arithmetic processors
NASA Technical Reports Server (NTRS)
Lim, R. S.
1979-01-01
A method of concurrent error detection for arithmetic processors is described. Low-cost residue codes with check-length l and checkbase m = 2 to the l power - 1 are described for checking arithmetic operations of addition, subtraction, multiplication, division complement, shift, and rotate. Of the three number representations, the signed-magnitude representation is preferred for residue checking. Two methods of residue generation are described: the standard method of using modulo m adders and the method of using a self-testing residue tree. A simple single-bit parity-check code is described for checking the logical operations of XOR, OR, and AND, and also the arithmetic operations of complement, shift, and rotate. For checking complement, shift, and rotate, the single-bit parity-check code is simpler to implement than the residue codes.
IMAGEP - A FORTRAN ALGORITHM FOR DIGITAL IMAGE PROCESSING
NASA Technical Reports Server (NTRS)
Roth, D. J.
1994-01-01
IMAGEP is a FORTRAN computer algorithm containing various image processing, analysis, and enhancement functions. It is a keyboard-driven program organized into nine subroutines. Within the subroutines are other routines, also, selected via keyboard. Some of the functions performed by IMAGEP include digitization, storage and retrieval of images; image enhancement by contrast expansion, addition and subtraction, magnification, inversion, and bit shifting; display and movement of cursor; display of grey level histogram of image; and display of the variation of grey level intensity as a function of image position. This algorithm has possible scientific, industrial, and biomedical applications in material flaw studies, steel and ore analysis, and pathology, respectively. IMAGEP is written in VAX FORTRAN for DEC VAX series computers running VMS. The program requires the use of a Grinnell 274 image processor which can be obtained from Mark McCloud Associates, Campbell, CA. An object library of the required GMR series software is included on the distribution media. IMAGEP requires 1Mb of RAM for execution. The standard distribution medium for this program is a 1600 BPI 9track magnetic tape in VAX FILES-11 format. It is also available on a TK50 tape cartridge in VAX FILES-11 format. This program was developed in 1991. DEC, VAX, VMS, and TK50 are trademarks of Digital Equipment Corporation.
Data storage technology comparisons
NASA Technical Reports Server (NTRS)
Katti, Romney R.
1990-01-01
The role of data storage and data storage technology is an integral, though conceptually often underestimated, portion of data processing technology. Data storage is important in the mass storage mode in which generated data is buffered for later use. But data storage technology is also important in the data flow mode when data are manipulated and hence required to flow between databases, datasets and processors. This latter mode is commonly associated with memory hierarchies which support computation. VLSI devices can reasonably be defined as electronic circuit devices such as channel and control electronics as well as highly integrated, solid-state devices that are fabricated using thin film deposition technology. VLSI devices in both capacities play an important role in data storage technology. In addition to random access memories (RAM), read-only memories (ROM), and other silicon-based variations such as PROM's, EPROM's, and EEPROM's, integrated devices find their way into a variety of memory technologies which offer significant performance advantages. These memory technologies include magnetic tape, magnetic disk, magneto-optic disk, and vertical Bloch line memory. In this paper, some comparison between selected technologies will be made to demonstrate why more than one memory technology exists today, based for example on access time and storage density at the active bit and system levels.
QCDOC: A 10-teraflops scale computer for lattice QCD
NASA Astrophysics Data System (ADS)
Chen, D.; Christ, N. H.; Cristian, C.; Dong, Z.; Gara, A.; Garg, K.; Joo, B.; Kim, C.; Levkova, L.; Liao, X.; Mawhinney, R. D.; Ohta, S.; Wettig, T.
2001-03-01
The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from "QCD On a Chip".
Computer-Aided Fabrication of Integrated Circuits
1989-09-30
baseline CMOS process. One result of this effort was the identification of several residual bugs in the PATRAN graphics processor . The vendor promises...virtual memory. The internal Nubus architecture uses a 32-bit LISP processor running at 10 megahertz (100 ns clock period). An ethernet controller is...For different patterns, we need different masks for the photo step, and for dif- ferent micro -structures of the wafers, we need different etching
Sparse distributed memory prototype: Principles of operation
NASA Technical Reports Server (NTRS)
Flynn, Michael J.; Kanerva, Pentti; Ahanin, Bahram; Bhadkamkar, Neal; Flaherty, Paul; Hickey, Philip
1988-01-01
Sparse distributed memory is a generalized random access memory (RAM) for long binary words. Such words can be written into and read from the memory, and they can be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original right address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech and scene analysis, in signal detection and verification, and in adaptive control of automated equipment. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. The research is aimed at resolving major design issues that have to be faced in building the memories. The design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words is described. A key aspect of the design is extensive use of dynamic RAM and other standard components.
Read disturb errors in a CMOS static RAM chip. [radiation hardened for spacedraft
NASA Technical Reports Server (NTRS)
Wood, Steven H.; Marr, James C., IV; Nguyen, Tien T.; Padgett, Dwayne J.; Tran, Joe C.; Griswold, Thomas W.; Lebowitz, Daniel C.
1989-01-01
Results are reported from an extensive investigation into pattern-sensitive soft errors (read disturb errors) in the TCC244 CMOS static RAM chip. The TCC244, also known as the SA2838, is a radiation-hard single-event-upset-resistant 4 x 256 memory chip. This device is being used by the Jet Propulsion Laboratory in the Galileo and Magellan spacecraft, which will have encounters with Jupiter and Venus, respectively. Two aspects of the part's design are shown to result in the occurrence of read disturb errors: the transparence of the signal path from the address pins to the array of cells, and the large resistance in the Vdd and Vss lines of the cells in the center of the array. Probe measurements taken during a read disturb failure illustrate how address skews and the data pattern in the chip combine to produce a bit flip. A capacitive charge pump formed by the individual cell capacitances and the resistance in the supply lines pumps down both the internal cell voltage and the local supply voltage until a bit flip occurs.
NASA Astrophysics Data System (ADS)
Tseng, Po-Hao; Hsu, Kai-Chieh; Lin, Yu-Yu; Lee, Feng-Min; Lee, Ming-Hsiu; Lung, Hsiang-Lan; Hsieh, Kuang-Yeu; Chung Wang, Keh; Lu, Chih-Yuan
2018-04-01
A high performance physically unclonable function (PUF) implemented with WO3 resistive random access memory (ReRAM) is presented in this paper. This robust ReRAM-PUF can eliminated bit flipping problem at very high temperature (up to 250 °C) due to plentiful read margin by using initial resistance state and set resistance state. It is also promised 10 years retention at the temperature range of 210 °C. These two stable resistance states enable stable operation at automotive environments from -40 to 125 °C without need of temperature compensation circuit. The high uniqueness of PUF can be achieved by implementing a proposed identification (ID)-generation method. Optimized forming condition can move 50% of the cells to low resistance state and the remaining 50% remain at initial high resistance state. The inter- and intra-PUF evaluations with unlimited separation of hamming distance (HD) are successfully demonstrated even under the corner condition. The number of reproduction was measured to exceed 107 times with 0% bit error rate (BER) at read voltage from 0.4 to 0.7 V.
Long sequence correlation coprocessor
NASA Astrophysics Data System (ADS)
Gage, Douglas W.
1994-09-01
A long sequence correlation coprocessor (LSCC) accelerates the bitwise correlation of arbitrarily long digital sequences by calculating in parallel the correlation score for 16, for example, adjacent bit alignments between two binary sequences. The LSCC integrated circuit is incorporated into a computer system with memory storage buffers and a separate general purpose computer processor which serves as its controller. Each of the LSCC's set of sequential counters simultaneously tallies a separate correlation coefficient. During each LSCC clock cycle, computer enable logic associated with each counter compares one bit of a first sequence with one bit of a second sequence to increment the counter if the bits are the same. A shift register assures that the same bit of the first sequence is simultaneously compared to different bits of the second sequence to simultaneously calculate the correlation coefficient by the different counters to represent different alignments of the two sequences.
Stream network and stream segment temperature models software
Bartholow, John
2010-01-01
This set of programs simulates steady-state stream temperatures throughout a dendritic stream network handling multiple time periods per year. The software requires a math co-processor and 384K RAM. Also included is a program (SSTEMP) designed to predict the steady state stream temperature within a single stream segment for a single time period.
Evaluation of an Adaptive Automation Trigger Based on Task Performance, Priority, and Frequency
2013-06-01
with dual Intel ® Xeon ® CPU x5550 processors @ 2.67 GHz each, 12.0 GB RAM, and a 1.5 GB PCIe nVidia Quadro FX 4800 graphics card (Microsoft...Cole Publishing Company . Miller, C. A., & Parasuraman, R. (2007). Designing for flexible interaction between humans and automation: Delegation
Multiple Embedded Processors for Fault-Tolerant Computing
NASA Technical Reports Server (NTRS)
Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy
2005-01-01
A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
An inexpensive digital tape recorder suitable for neurophysiological signals.
Lamb, T D
1985-10-01
Modifications are described which convert an inexpensive 'Digital Audio Processor' (Sony PCM-701ES), together with a video cassette recorder, into a high performance digital tape recorder, with two analog channels of 16 bit resolution and DC-20 kHz bandwidth. A further modification is described which optionally provides four additional 1-bit digital channels by sacrificing the least significant four bits of one analog channel. If required two additional high quality analog channels may be obtained by use of one of the new video cassette recorders (such as the Sony SL-HF100) which incorporate a pair of FM tracks.
NASA Astrophysics Data System (ADS)
Chen, Ming-Chih; Hsiao, Shen-Fu
In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations required in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level XOR/AND-based sum-of-product expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs.
CoNNeCT Baseband Processor Module
NASA Technical Reports Server (NTRS)
Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.
2011-01-01
A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.
Neville, R S; Stonham, T J; Glover, R J
2000-01-01
In this article we present a methodology that partially pre-calculates the weight updates of the backpropagation learning regime and obtains high accuracy function mapping. The paper shows how to implement neural units in a digital formulation which enables the weights to be quantised to 8-bits and the activations to 9-bits. A novel methodology is introduced to enable the accuracy of sigma-pi units to be increased by expanding their internal state space. We, also, introduce a novel means of implementing bit-streams in ring memories instead of utilising shift registers. The investigation utilises digital "Higher Order" sigma-pi nodes and studies continuous input RAM-based sigma-pi units. The units are trained with the backpropagation learning regime to learn functions to a high accuracy. The neural model is the sigma-pi units which can be implemented in digital microelectronic technology. The ability to perform tasks that require the input of real-valued information, is one of the central requirements of any cognitive system that utilises artificial neural network methodologies. In this article we present recent research which investigates a technique that can be used for mapping accurate real-valued functions to RAM-nets. One of our goals was to achieve accuracies of better than 1% for target output functions in the range Y epsilon [0,1], this is equivalent to an average Mean Square Error (MSE) over all training vectors of 0.0001 or an error modulus of 0.01. We present a development of the sigma-pi node which enables the provision of high accuracy outputs. The sigma-pi neural model was initially developed by Gurney (Learning in nets of structured hypercubes. PhD Thesis, Department of Electrical Engineering, Brunel University, Middlessex, UK, 1989; available as Technical Memo CN/R/144). Gurney's neuron models, the Time Integration Node (TIN), utilises an activation that was derived from a bit-stream. In this article we present a new methodology for storing sigma-pi node's activations as single values which are averages. In the course of the article we state what we define as a real number; how we represent real numbers and input of continuous values in our neural system. We show how to utilise the bounded quantised site-values (weights) of sigma-pi nodes to make training of these neurocomputing systems simple, using pre-calculated look-up tables to train the nets. In order to meet our accuracy goal, we introduce a means of increasing the bandwidth capability of sigma-pi units by expanding their internal state-space. In our implementation we utilise bit-streams when we calculate the real-valued outputs of the net. To simplify the hardware implementation of bit-streams we present a method of mapping them to RAM-based hardware using 'ring memories'. Finally, we study the sigma-pi units' ability to generalise once they are trained to map real-valued, high accuracy, continuous functions. We use sigma-pi units as they have been shown to have shorter training times than their analogue counterparts and can also overcome some of the drawbacks of semi-linear units (Gurney, 1992. Neural Networks, 5, 289-303).
Memory-based frame synchronizer. [for digital communication systems
NASA Technical Reports Server (NTRS)
Stattel, R. J.; Niswander, J. K. (Inventor)
1981-01-01
A frame synchronizer for use in digital communications systems wherein data formats can be easily and dynamically changed is described. The use of memory array elements provide increased flexibility in format selection and sync word selection in addition to real time reconfiguration ability. The frame synchronizer comprises a serial-to-parallel converter which converts a serial input data stream to a constantly changing parallel data output. This parallel data output is supplied to programmable sync word recognizers each consisting of a multiplexer and a random access memory (RAM). The multiplexer is connected to both the parallel data output and an address bus which may be connected to a microprocessor or computer for purposes of programming the sync word recognizer. The RAM is used as an associative memory or decorder and is programmed to identify a specific sync word. Additional programmable RAMs are used as counter decoders to define word bit length, frame word length, and paragraph frame length.
Second Report of the Multirate Processor (MRP) for Digital Voice Communications.
1982-09-30
machine are: * two arithmetic logic units (ALUs)-one for data processing, and the other for address generation, * two memorys -6144 words (70 bits per word...of program memory , and 6094 words (16 bits per word) of data memory , q * input/output through modem and teletype, -15 .9 S-;. KANG AND FRANSEN Table...provides a measure of intelligibility and allows one to evaluate the discriminability of six distinctive features: voicing, nasality, sustention
Video Bandwidth Compression System.
1980-08-01
scaling function, located between the inverse DPCM and inverse transform , on the decoder matrix multiplier chips. 1"V1 T.. ---- i.13 SECURITY...Bit Unpacker and Inverse DPCM Slave Sync Board 15 e. Inverse DPCM Loop Boards 15 f. Inverse Transform Board 16 g. Composite Video Output Board 16...36 a. Display Refresh Memory 36 (1) Memory Section 37 (2) Timing and Control 39 b. Bit Unpacker and Inverse DPCM 40 c. Inverse Transform Processor 43
Command system output bit verification
NASA Technical Reports Server (NTRS)
Odd, C. W.; Abbate, S. F.
1981-01-01
An automatic test was developed to test the ability of the deep space station (DSS) command subsystem and exciter to generate and radiate, from the exciter, the correct idle bit sequence for a given flight project or to store and radiate received command data elements and files without alteration. This test, called the command system output bit verification test, is an extension of the command system performance test (SPT) and can be selected as an SPT option. The test compares the bit stream radiated from the DSS exciter with reference sequences generated by the SPT software program. The command subsystem and exciter are verified when the bit stream and reference sequences are identical. It is a key element of the acceptance testing conducted on the command processor assembly (CPA) operational program (DMC-0584-OP-G) prior to its transfer from development to operations.
NASA Technical Reports Server (NTRS)
Psiaki, Mark L. (Inventor); Kintner, Jr., Paul M. (Inventor); Ledvina, Brent M. (Inventor); Powell, Steven P. (Inventor)
2007-01-01
A real-time software receiver that executes on a general purpose processor. The software receiver includes data acquisition and correlator modules that perform, in place of hardware correlation, baseband mixing and PRN code correlation using bit-wise parallelism.
NASA Technical Reports Server (NTRS)
Psiaki, Mark L. (Inventor); Ledvina, Brent M. (Inventor); Powell, Steven P. (Inventor); Kintner, Jr., Paul M. (Inventor)
2006-01-01
A real-time software receiver that executes on a general purpose processor. The software receiver includes data acquisition and correlator modules that perform, in place of hardware correlation, baseband mixing and PRN code correlation using bit-wise parallelism.
Evaluation of Magnetoresistive RAM for Space Applications
NASA Technical Reports Server (NTRS)
Heidecker, Jason
2014-01-01
Magnetoresistive random-access memory (MRAM) is a non-volatile memory that exploits electronic spin, rather than charge, to store data. Instead of moving charge on and off a floating gate to alter the threshold voltage of a CMOS transistor (creating different bit states), MRAM uses magnetic fields to flip the polarization of a ferromagnetic material thus switching its resistance and bit state. These polarized states are immune to radiation-induced upset, thus making MRAM very attractive for space application. These magnetic memory elements also have infinite data retention and erase/program endurance. Presented here are results of reliability testing of two space-qualified MRAM products from Aeroflex and Honeywell.
The special radiation-hardened processors for new highly informative experiments in space
NASA Astrophysics Data System (ADS)
Serdin, O. V.; Antonov, A. A.; Dubrovsky, A. G.; Novogilov, E. A.; Zuev, A. L.
2017-01-01
The article provides a detailed description of the series of special radiation-hardened microprocessor developed by SRISA for use in space technology. The microprocessors have 32-bit and 64-bit KOMDIV architecture with embedded SpaceWire, RapidIO, Ethernet and MIL-STD-1553B interfaces. These devices are used in space telescope GAMMA-400 data acquisition system, and may also be applied to other experiments in space (such as observatory “Millimetron” etc.).
Cache Energy Optimization Techniques For Modern Processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
2013-01-01
Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In thismore » book, we present novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. Also, we present cache energy saving techniques for caches designed with both conventional SRAM devices and emerging non-volatile devices such as STT-RAM (spin-torque transfer RAM). We present software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with state-of-the-art techniques and have found that our techniques outperform them in terms of energy efficiency and other relevant metrics. The techniques presented in this book have important applications in improving energy-efficiency of higher-end embedded, desktop, QoS, real-time, server processors and multitasking systems. This book is intended to be a valuable guide for both newcomers and veterans in the field of cache power management. It will help graduate students, CAD tool developers and designers in understanding the need of energy efficiency in modern computing systems. Further, it will be useful for researchers in gaining insights into algorithms and techniques for micro-architectural and system-level energy optimization using dynamic cache reconfiguration. We sincerely believe that the ``food for thought'' presented in this book will inspire the readers to develop even better ideas for designing ``green'' processors of tomorrow.« less
The design of an adaptive predictive coder using a single-chip digital signal processor
NASA Astrophysics Data System (ADS)
Randolph, M. A.
1985-01-01
A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
NASA Astrophysics Data System (ADS)
Giusi, Giovanni; Liu, Scige J.; Galli, Emanuele; Di Giorgio, Anna M.; Farina, Maria; Vertolli, Nello; Di Lellis, Andrea M.
2016-07-01
In this paper we present the results of a series of performance tests carried out on a prototype board mounting the Cobham Gaisler GR712RC Dual Core LEON3FT processor. The aim was the characterization of the performances of the dual core processor when used for executing a highly demanding lossless compression task, acting on data segments continuously copied from the static memory to the processor RAM. The selection of the compression activity to evaluate the performances was driven by the possibility of a comparison with previously executed tests on the Cobham/Aeroflex Gaisler UT699 LEON3FT SPARC™ V8. The results of the test activity have shown a factor 1.6 of improvement with respect to the previous tests, which can easily be improved by adopting a faster onboard board clock, and provided indications on the best size of the data chunks to be used in the compression activity.
Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
Dandass, Yoginder S; Burgess, Shane C; Lawrence, Mark; Bridges, Susan M
2008-01-01
Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation. PMID:18412963
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Blaettler, M; Bruegger, A; Forster, I C; Lehareinger, Y
1988-03-01
The design of an analog interface to a digital audio signal processor (DASP)-video cassette recorder (VCR) system is described. The complete system represents a low-cost alternative to both FM instrumentation tape recorders and multi-channel chart recorders. The interface or DASP input-output unit described in this paper enables the recording and playback of up to 12 analog channels with a maximum of 12 bit resolution and a bandwidth of 2 kHz per channel. Internal control and timing in the recording component of the interface is performed using ROMs which can be reprogrammed to suit different analog-to-digital converter hardware. Improvement in the bandwidth specifications is possible by connecting channels in parallel. A parallel 16 bit data output port is provided for direct transfer of the digitized data to a computer.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA
NASA Astrophysics Data System (ADS)
Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei
2013-03-01
With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
Update on radiation-hardened microcomputers for robotics and teleoperated systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sias, F.R. Jr.; Tulenko, J.S.
1993-12-31
Since many programs sponsored by the Department of Defense are being canceled, it is important to select carefully radiation-hardened microprocessors for projects that will mature (or will require continued support) several years in the future. At the present time there are seven candidate 32-bit processors that should be considered for long-range planning for high-performance radiation-hardened computer systems. For Department of Energy applications it is also important to consider efforts at standardization that require the use of the VxWorks operating system and hardware based on the VMEbus. Of the seven processors, one has been delivered and is operating and other systemsmore » are scheduled to be delivered late in 1993 or early in 1994. At the present time the Honeywell-developed RH32, the Harris RH-3000 and the Harris RHC-3000 are leading contenders for meeting DOE requirements for a radiation-hardened advanced 32-bit microprocessor. These are all either compatible with or are derivatives of the MIPS R3000 Reduced Instruction Set Computer. It is anticipated that as few as two of the seven radiation-hardened processors will be supported by the space program in the long run.« less
Digital system for structural dynamics simulation
NASA Technical Reports Server (NTRS)
Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.
1982-01-01
State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
A 32-bit NMOS microprocessor with a large register file
NASA Astrophysics Data System (ADS)
Sherburne, R. W., Jr.; Katevenis, M. G. H.; Patterson, D. A.; Sequin, C. H.
1984-10-01
Two scaled versions of a 32-bit NMOS reduced instruction set computer CPU, called RISC II, have been implemented on two different processing lines using the simple Mead and Conway layout rules with lambda values of 2 and 1.5 microns (corresponding to drawn gate lengths of 4 and 3 microns), respectively. The design utilizes a small set of simple instructions in conjunction with a large register file in order to provide high performance. This approach has resulted in two surprisingly powerful single-chip processors.
120-MHz BiCMOS superscalar RISC processor
NASA Astrophysics Data System (ADS)
Tanaka, Shigeya; Hotta, Takashi; Murabayashi, Fumio; Yamada, Hiromichi; Yoshida, Shoji; Shimamura, Kotaro; Katsura, Koyo; Bandoh, Tadaaki; Ikeda, Koichi; Matsubara, Kenji
1994-04-01
A superscalar RISC processor contains 2.8 million transistors in a die size of 16.2 mm x 16.5 mm, and utilizes 3.3 V/0.5 micron BiCMOS technology. In order to take advantage of superscalar performance without incurring penalties from a slower clock or a longer pipeline, a tag bit is implemented in the instruction cache to indicate dependency between two instructions. A performance gain of up to 37% is obtained with only a 3.5% area overhead from our superscalar design.
Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor
NASA Technical Reports Server (NTRS)
Moore, J. Strother
1992-01-01
Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.
An FPGA Architecture for Extracting Real-Time Zernike Coefficients from Measured Phase Gradients
NASA Astrophysics Data System (ADS)
Moser, Steven; Lee, Peter; Podoleanu, Adrian
2015-04-01
Zernike modes are commonly used in adaptive optics systems to represent optical wavefronts. However, real-time calculation of Zernike modes is time consuming due to two factors: the large factorial components in the radial polynomials used to define them and the large inverse matrix calculation needed for the linear fit. This paper presents an efficient parallel method for calculating Zernike coefficients from phase gradients produced by a Shack-Hartman sensor and its real-time implementation using an FPGA by pre-calculation and storage of subsections of the large inverse matrix. The architecture exploits symmetries within the Zernike modes to achieve a significant reduction in memory requirements and a speed-up of 2.9 when compared to published results utilising a 2D-FFT method for a grid size of 8×8. Analysis of processor element internal word length requirements show that 24-bit precision in precalculated values of the Zernike mode partial derivatives ensures less than 0.5% error per Zernike coefficient and an overall error of <1%. The design has been synthesized on a Xilinx Spartan-6 XC6SLX45 FPGA. The resource utilisation on this device is <3% of slice registers, <15% of slice LUTs, and approximately 48% of available DSP blocks independent of the Shack-Hartmann grid size. Block RAM usage is <16% for Shack-Hartmann grid sizes up to 32×32.
Multi-channel time-reversal receivers for multi and 1-bit implementations
Candy, James V.; Chambers, David H.; Guidry, Brian L.; Poggio, Andrew J.; Robbins, Christopher L.
2008-12-09
A communication system for transmitting a signal through a channel medium comprising digitizing the signal, time-reversing the digitized signal, and transmitting the signal through the channel medium. In one embodiment a transmitter is adapted to transmit the signal, a multiplicity of receivers are adapted to receive the signal, a digitizer digitizes the signal, and a time-reversal signal processor is adapted to time-reverse the digitized signal. An embodiment of the present invention includes multi bit implementations. Another embodiment of the present invention includes 1-bit implementations. Another embodiment of the present invention includes a multiplicity of receivers used in the step of transmitting the signal through the channel medium.
NASA Astrophysics Data System (ADS)
Selker, Ted
1983-05-01
Lens focusing using a hardware model of a retina (Reticon RL256 light sensitive array) with a low cost processor (8085 with 512 bytes of ROM and 512 bytes of RAM) was built. This system was developed and tested on a variety of visual stimuli to demonstrate that: a)an algorithm which moves a lens to maximize the sum of the difference of light level on adjacent light sensors will converge to best focus in all but contrived situations. This is a simpler algorithm than any previously suggested; b) it is feasible to use unmodified video sensor arrays with in-expensive processors to aid video camera use. In the future, software could be developed to extend the processor's usefulness, possibly to track an actor by panning and zooming to give a earners operator increased ease of framing; c) lateral inhibition is an adequate basis for determining best focus. This supports a simple anatomically motivated model of how our brain focuses our eyes.
Loran-C digital word generator for use with a KIM-1 microprocessor system
NASA Technical Reports Server (NTRS)
Nickum, J. D.
1977-01-01
The problem of translating the time of occurrence of received Loran-C pulses into a time, referenced to a particular period of occurrence is addressed and applied to the design of a digital word generator for a Loran-C sensor processor package. The digital information from this word generator is processed in a KIM-1 microprocessor system which is based on the MOS 6502 CPU. This final system will consist of a complete time difference sensor processor for determining position information using Loran-C charts. The system consists of the KIM-1 microprocessor module, a 4K RAM memory board, a user interface, and the Loran-C word generator.
A Low-Power Instruction Issue Queue for Microprocessors
NASA Astrophysics Data System (ADS)
Watanabe, Shingo; Chiyonobu, Akihiro; Sato, Toshinori
Instruction issue queue is a key component which extracts instruction level parallelism (ILP) in modern out-of-order microprocessors. In order to exploit ILP for improving processor performance, instruction queue size should be increased. However, it is difficult to increase the size, since instruction queue is implemented by a content addressable memory (CAM) whose power and delay are much large. This paper introduces a low power and scalable instruction queue that replaces the CAM with a RAM. In this queue, instructions are explicitly woken up. Evaluation results show that the proposed instruction queue decreases processor performance by only 1.9% on average. Furthermore, the total energy consumption is reduced by 54% on average.
Man-Machine Impact of Technology on Coast Guard Missions and Systems
1979-12-01
t Cost of Rar~dom, Acce~ss eoy~mAlr 97 f-Al 1000 MOS RAM-(409 BITS/CHIP) . 100 _ I• z LLJ 10 (I) UI 1.04 I.-I- ’ YEAR ii A .. I. FiueA-.oecs pedo ...of these advances will iTOSt likely be accomplished through focal plane arrays of detectors, charge coupled device readout techniques for the video
Parallel processor for real-time structural control
NASA Astrophysics Data System (ADS)
Tise, Bert L.
1993-07-01
A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
Learning to Decode Nonverbal Cues in Cross-Cultural Interactions
2009-06-01
iPhones support Mac OS X v10.4.10 or later operating system, as well as Windows Vista and XP, and iTunes 7.5 or later. Apple has designed the iPhones to be...Processor; 1G RAM, 1G HD, Direct X9/ATI Radeon 9800 card with dedicated memory; Noise-canceling headset w/ microphone. Apple video iPod (can be
Large-Constraint-Length, Fast Viterbi Decoder
NASA Technical Reports Server (NTRS)
Collins, O.; Dolinar, S.; Hsu, In-Shek; Pollara, F.; Olson, E.; Statman, J.; Zimmerman, G.
1990-01-01
Scheme for efficient interconnection makes VLSI design feasible. Concept for fast Viterbi decoder provides for processing of convolutional codes of constraint length K up to 15 and rates of 1/2 to 1/6. Fully parallel (but bit-serial) architecture developed for decoder of K = 7 implemented in single dedicated VLSI circuit chip. Contains six major functional blocks. VLSI circuits perform branch metric computations, add-compare-select operations, and then store decisions in traceback memory. Traceback processor reads appropriate memory locations and puts out decoded bits. Used as building block for decoders of larger K.
Implementation of an FIR Band Pass Filter Using a Bit-Slice Processor.
1987-06-01
R14 S IMPLEMENTATION OF AN FIR BND PASS FILTER USING A 1 /2 S" IT-SLICE PROCESSOR(U) NAVAL POSTGRADUATE SCOOL N UTEREY CA D W PURDY JUN 97...WILRSIFIED F/I 12/6mmhhmmhhmhhhhl EIIIIIIEIIIII IIIIIIEEIIIEI IIIIIIIIIIIIIl IIIIIIIIIIIIIu EIIIIIIIIIIIII - U ~1I.25 1 .41.6 -qwr- -qw qw wV vw- .W sw...Ae 10 .w w ’ 1 - w* lo % % q* NAVAL POSTGRADUATE SCHOOL 0 Monterey, California 0 oi a ’: L t ",, I-’ ; OCT 0 ." THESIS IMPLEMENTATION OF AN FIR
SIGPROC: Pulsar Signal Processing Programs
NASA Astrophysics Data System (ADS)
Lorimer, D. R.
2011-07-01
SIGPROC is a package designed to standardize the initial analysis of the many types of fast-sampled pulsar data. Currently recognized machines are the Wide Band Arecibo Pulsar Processor (WAPP), the Penn State Pulsar Machine (PSPM), the Arecibo Observatory Fourier Transform Machine (AOFTM), the Berkeley Pulsar Processors (BPP), the Parkes/Jodrell 1-bit filterbanks (SCAMP) and the filterbank at the Ooty radio telescope (OOTY). The SIGPROC tools should help users look at their data quickly, without the need to write (yet) another routine to read data or worry about big/little endian compatibility (byte swapping is handled automatically).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.
Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whethermore » global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.« less
Hardware accuracy counters for application precision and quality feedback
DOE Office of Scientific and Technical Information (OSTI.GOV)
de Paula Rosa Piga, Leonardo; Majumdar, Abhinandan; Paul, Indrani
Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be outputmore » to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.« less
A GaAs vector processor based on parallel RISC microprocessors
NASA Astrophysics Data System (ADS)
Misko, Tim A.; Rasset, Terry L.
A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Studies Of Single-Event-Upset Models
NASA Technical Reports Server (NTRS)
Zoutendyk, J. A.; Smith, L. S.; Soli, G. A.
1988-01-01
Report presents latest in series of investigations of "soft" bit errors known as single-event upsets (SEU). In this investigation, SEU response of low-power, Schottky-diode-clamped, transistor/transistor-logic (TTL) static random-access memory (RAM) observed during irradiation by Br and O ions in ranges of 100 to 240 and 20 to 100 MeV, respectively. Experimental data complete verification of computer model used to simulate SEU in this circuit.
NASA Astrophysics Data System (ADS)
Sanford, James L.; Schlig, Eugene S.; Prache, Olivier; Dove, Derek B.; Ali, Tariq A.; Howard, Webster E.
2002-02-01
The IBM Research Division and eMagin Corp. jointly have developed a low-power VGA direct view active matrix OLED display, fabricated on a crystalline silicon CMOS chip. The display is incorporated in IBM prototype wristwatch computers running the Linus operating system. IBM designed the silicon chip and eMagin developed the organic stack and performed the back-end-of line processing and packaging. Each pixel is driven by a constant current source controlled by a CMOS RAM cell, and the display receives its data from the processor memory bus. This paper describes the OLED technology and packaging, and outlines the design of the pixel and display electronics and the processor interface. Experimental results are presented.
Implications of scaling on static RAM bit cell stability and reliability
NASA Astrophysics Data System (ADS)
Coones, Mary Ann; Herr, Norm; Bormann, Al; Erington, Kent; Soorholtz, Vince; Sweeney, John; Phillips, Michael
1993-01-01
In order to lower manufacturing costs and increase performance, static random access memory (SRAM) bit cells are scaled progressively toward submicron geometries. The reliability of an SRAM is highly dependent on the bit cell stability. Smaller memory cells with less capacitance and restoring current make the array more susceptible to failures from defectivity, alpha hits, and other instabilities and leakage mechanisms. Improving long term reliability while migrating to higher density devices makes the task of building in and improving reliability increasingly difficult. Reliability requirements for high density SRAMs are very demanding with failure rates of less than 100 failures per billion device hours (100 FITs) being a common criteria. Design techniques for increasing bit cell stability and manufacturability must be implemented in order to build in this level of reliability. Several types of analyses are performed to benchmark the performance of the SRAM device. Examples of these analysis techniques which are presented here include DC parametric measurements of test structures, functional bit mapping of the circuit used to characterize the entire distribution of bits, electrical microprobing of weak and/or failing bits, and system and accelerated soft error rate measurements. These tests allow process and design improvements to be evaluated prior to implementation on the final product. These results are used to provide comprehensive bit cell characterization which can then be compared to device models and adjusted accordingly to provide optimized cell stability versus cell size for a particular technology. The result is designed in reliability which can be accomplished during the early stages of product development.
Parallel processor for real-time structural control
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tise, B.L.
1992-01-01
A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less
The Transportable Measurements Facility (TMF) System Description.
1980-05-23
sites using different antennas, antenna/site characterization, ATCRBS-mode and DABS-mode processor evaluation, and DABS-based ATC and ATARS system...conditions met. 34 TABLE 3 TMF DATA RECORDED DURING EXPERIMENTS By Word Type 1) By Parameter Word No. of Bits Experiment Number 8 Physical Location Number of
Design and implementation of an optical Gaussian noise generator
NASA Astrophysics Data System (ADS)
Za~O, Leonardo; Loss, Gustavo; Coelho, Rosângela
2009-08-01
A design of a fast and accurate optical Gaussian noise generator is proposed and demonstrated. The noise sample generation is based on the Box-Muller algorithm. The functions implementation was performed on a high-speed Altera Stratix EP1S25 field-programmable gate array (FPGA) development kit. It enabled the generation of 150 million 16-bit noise samples per second. The Gaussian noise generator required only 7.4% of the FPGA logic elements, 1.2% of the RAM memory, 0.04% of the ROM memory, and a laser source. The optical pulses were generated by a laser source externally modulated by the data bit samples using the frequency-shift keying technique. The accuracy of the noise samples was evaluated for different sequences size and confidence intervals. The noise sample pattern was validated by the Bhattacharyya distance (Bd) and the autocorrelation function. The results showed that the proposed design of the optical Gaussian noise generator is very promising to evaluate the performance of optical communications channels with very low bit-error-rate values.
Servicing a globally broadcast interrupt signal in a multi-threaded computer
Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.; Satterfield, David L.
2015-12-29
Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whether global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.
Design, processing, and testing of LSI arrays for space station
NASA Technical Reports Server (NTRS)
Schneider, W. C.
1974-01-01
At wafer probe, units of the TA6567 circuit, a beam leaded COS/MOS/SOS 256-bit RAM, were demonstrated to be functionally perfect. An aluminum gate current-sense version and a silicon-gate voltage-sense version of this memory were developed. Initial base line data for the beam lead SOS process using the TA5388 circuit show the stability of the dc device characteristics through the beam lead processing.
A High-Throughput Processor for Flight Control Research Using Small UAVs
NASA Technical Reports Server (NTRS)
Klenke, Robert H.; Sleeman, W. C., IV; Motter, Mark A.
2006-01-01
There are numerous autopilot systems that are commercially available for small (<100 lbs) UAVs. However, they all share several key disadvantages for conducting aerodynamic research, chief amongst which is the fact that most utilize older, slower, 8- or 16-bit microcontroller technologies. This paper describes the development and testing of a flight control system (FCS) for small UAV s based on a modern, high throughput, embedded processor. In addition, this FCS platform contains user-configurable hardware resources in the form of a Field Programmable Gate Array (FPGA) that can be used to implement custom, application-specific hardware. This hardware can be used to off-load routine tasks such as sensor data collection, from the FCS processor thereby further increasing the computational throughput of the system.
ACE: Automatic Centroid Extractor for real time target tracking
NASA Technical Reports Server (NTRS)
Cameron, K.; Whitaker, S.; Canaris, J.
1990-01-01
A high performance video image processor has been implemented which is capable of grouping contiguous pixels from a raster scan image into groups and then calculating centroid information for each object in a frame. The algorithm employed to group pixels is very efficient and is guaranteed to work properly for all convex shapes as well as most concave shapes. Processing speeds are adequate for real time processing of video images having a pixel rate of up to 20 million pixels per second. Pixels may be up to 8 bits wide. The processor is designed to interface directly to a transputer serial link communications channel with no additional hardware. The full custom VLSI processor was implemented in a 1.6 mu m CMOS process and measures 7200 mu m on a side.
Solving the Cauchy-Riemann equations on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Negative base encoding in optical linear algebra processors
NASA Technical Reports Server (NTRS)
Perlee, C.; Casasent, D.
1986-01-01
In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
Optical Neural Classification Of Binary Patterns
NASA Astrophysics Data System (ADS)
Gustafson, Steven C.; Little, Gordon R.
1988-05-01
Binary pattern classification that may be implemented using optical hardware and neural network algorithms is considered. Pattern classification problems that have no concise description (as in classifying handwritten characters) or no concise computation (as in NP-complete problems) are expected to be particularly amenable to this approach. For example, optical processors that efficiently classify binary patterns in accordance with their Boolean function complexity might be designed. As a candidate for such a design, an optical neural network model is discussed that is designed for binary pattern classification and that consists of an optical resonator with a dynamic multiplex-recorded reflection hologram and a phase conjugate mirror with thresholding and gain. In this model, learning or training examples of binary patterns may be recorded on the hologram such that one bit in each pattern marks the pattern class. Any input pattern, including one with an unknown class or marker bit, will be modified by a large number of parallel interactions with the reflection hologram and nonlinear mirror. After perhaps several seconds and 100 billion interactions, a steady-state pattern may develop with a marker bit that represents a minimum-Boolean-complexity classification of the input pattern. Computer simulations are presented that illustrate progress in understanding the behavior of this model and in developing a processor design that could have commanding and enduring performance advantages compared to current pattern classification techniques.
NSC 800, 8-bit CMOS microprocessor
NASA Technical Reports Server (NTRS)
Suszko, S. F.
1984-01-01
The NSC 800 is an 8-bit CMOS microprocessor manufactured by National Semiconductor Corp., Santa Clara, California. The 8-bit microprocessor chip with 40-pad pin-terminals has eight address buffers (A8-A15), eight data address -- I/O buffers (AD(sub 0)-AD(sub 7)), six interrupt controls and sixteen timing controls with a chip clock generator and an 8-bit dynamic RAM refresh circuit. The 22 internal registers have the capability of addressing 64K bytes of memory and 256 I/O devices. The chip is fabricated on N-type (100) silicon using self-aligned polysilicon gates and local oxidation process technology. The chip interconnect consists of four levels: Aluminum, Polysi 2, Polysi 1, and P(+) and N(+) diffusions. The four levels, except for contact interface, are isolated by interlevel oxide. The chip is packaged in a 40-pin dual-in-line (DIP), side brazed, hermetically sealed, ceramic package with a metal lid. The operating voltage for the device is 5 V. It is available in three operating temperature ranges: 0 to +70 C, -40 to +85 C, and -55 to +125 C. Two devices were submitted for product evaluation by F. Stott, MTS, JPL Microprocessor Specialist. The devices were pencil-marked and photographed for identification.
Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer
NASA Technical Reports Server (NTRS)
Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw
2000-01-01
Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.
Single-event effects in avionics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Normand, E.
1996-04-01
The occurrence of single-event upset (SEU) in aircraft electronics has evolved from a series of interesting anecdotal incidents to accepted fact. A study completed in 1992 demonstrated that SEU`s are real, that the measured in-flight rates correlate with the atmospheric neutron flux, and that the rates can be calculated using laboratory SEU data. Once avionics DEU was shown to be an actual effect, it had to be dealt with in avionics designs. The major concern is in random access memories (RAM`s), both static (SRAM`s) and dynamic (DRAM`s), because these microelectronic devices contain the largest number of bits, but other parts,more » such as microprocessors, are also potentially susceptible to upset. In addition, other single-event effects (SEE`s), specifically latch-up and burnout, can also be induced by atmospheric neutrons.« less
Comparisons of single event vulnerability of GaAs SRAMS
NASA Astrophysics Data System (ADS)
Weatherford, T. R.; Hauser, J. R.; Diehl, S. E.
1986-12-01
A GaAs MESFET/JFET model incorporated into SPICE has been used to accurately describe C-EJFET, E/D MESFET and D MESFET/resistor GaAs memory technologies. These cells have been evaluated for critical charges due to gate-to-drain and drain-to-source charge collection. Low gate-to-drain critical charges limit conventional GaAs SRAM soft error rates to approximately 1E-6 errors/bit-day. SEU hardening approaches including decoupling resistors, diodes, and FETs have been investigated. Results predict GaAs RAM cell critical charges can be increased to over 0.1 pC. Soft error rates in such hardened memories may approach 1E-7 errors/bit-day without significantly reducing memory speed. Tradeoffs between hardening level, performance and fabrication complexity are discussed.
Wójcik, J.; Kujawska, T.; Nowicki, A.; Lewin, P.A.
2008-01-01
The primary goal of this work was to verify experimentally the applicability of the recently introduced Time-Averaged Wave Envelope (TAWE) method [1] as a tool for fast prediction of four dimensional (4D) pulsed nonlinear pressure fields from arbitrarily shaped acoustic sources in attenuating media. The experiments were performed in water at the fundamental frequency of 2.8 MHz for spherically focused (focal length F = 80 mm) square (20 × 20 mm) and rectangular (10 × 25 mm) sources similar to those used in the design of 1D linear arrays operating with ultrasonic imaging systems. The experimental results obtained with 10-cycle tone bursts at three different excitation levels corresponding to linear, moderately nonlinear and highly nonlinear propagation conditions (0.045, 0.225 and 0.45 MPa on-source pressure amplitude, respectively) were compared with those yielded using the TAWE approach [1]. The comparison of the experimental results and numerical simulations has shown that the TAWE approach is well suited to predict (to within ± 1 dB) both the spatial-temporal and spatial-spectral pressure variations in the pulsed nonlinear acoustic beams. The obtained results indicated that implementation of the TAWE approach enabled shortening of computation time in comparison with the time needed for prediction of the full 4D pulsed nonlinear acoustic fields using a conventional (Fourier-series) approach [2]. The reduction in computation time depends on several parameters, including the source geometry, dimensions, fundamental resonance frequency, excitation level as well as the strength of the medium nonlinearity. For the non-axisymmetric focused transducers mentioned above and excited by a tone burst corresponding to moderately nonlinear and highly nonlinear conditions the execution time of computations was 3 and 12 hours, respectively, when using a 1.5 GHz clock frequency, 32-bit processor PC laptop with 2 GB RAM memory, only. Such prediction of the full 4D pulsed field is not possible when using conventional, Fourier-series scheme as it would require increasing the RAM memory by at least 2 orders of magnitude. PMID:18474387
Precision studies of the NNLO DGLAP evolution at the LHC with Candia
NASA Astrophysics Data System (ADS)
Cafarella, Alessandro; Corianò, Claudio; Guzzi, Marco
2008-11-01
We summarize the theoretical approach to the solution of the NNLO DGLAP equations using methods based on the logarithmic expansions in x-space and their implementation into the C program CANDIA 1.0. We present the various options implemented in the program and discuss the different solutions. The user can choose the order of the evolution, the type of the solution, which can be either exact or truncated, and the evolution either with a fixed or a varying flavor number, implemented in the varying-flavor-number scheme (VFNS). The renormalization and factorization scale dependencies are treated separately. In the non-singlet sector the program implements an exact NNLO solution. Program summaryProgram title: CANDIA Catalogue identifier: AEBK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 101 376 No. of bytes in distributed program, including test data, etc.: 5 865 234 Distribution format: tar.gz Programming language: C and Fortran Computer: All Operating system: Linux RAM: In the given examples, it ranges from 4 to 490 MB Classification: 11.1, 11.5 Nature of problem: The program provided here solves the DGLAP evolution equations for the parton distribution functions up to NNLO. Solution method: The algorithm implemented is based on the theory of the logarithmic expansions in Bjorken x-space. Additional comments: To be sure of getting the latest version of the program, the authors suggest downloading the code from their official CANDIA website ( http://www.le.infn.it/candia). Running time: In the given examples, it ranges from 1 to 40 minutes. The jobs have been executed on an Intel Core 2 Duo T7250 CPU at 2 GHz with a 64 bit Linux kernel. The test run script included in the package contains 5 sample runs and may take a number of hours to process, depending on the speed of the processor used and the size of the available RAM. http://www.le.infn.it/candia.
Onboard Experiment Data Support Facility
NASA Technical Reports Server (NTRS)
1976-01-01
An onboard array structure has been devised for end to end processing of data from multiple spaceborne sensors. The array constitutes sets of programmable pipeline processors whose elements perform each assigned function in 0.25 microseconds. This space shuttle computer system can handle data rates from a few bits to over 100 megabits per second.
Interface Circuits for Self-Checking Microprocessors
NASA Technical Reports Server (NTRS)
Rennels, D. A.; Chandramouli, R.
1986-01-01
Fault-tolerant-microcomputer concept based on enhancing "simple" computer with redundancy and self-checking logic circuits detect hardware faults. Interface and checking logic and redundant processors confer on 16-bit microcomputer ability to check itself for hardware faults. Checking circuitry also checks itself. Concept of self-checking complementary pairs (SCCP's) employed throughout ICL unit.
ATCA digital controller hardware for vertical stabilization of plasmas in tokamaks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Batista, A. J. N.; Sousa, J.; Varandas, C. A. F.
2006-10-15
The efficient vertical stabilization (VS) of plasmas in tokamaks requires a fast reaction of the VS controller, for example, after detection of edge localized modes (ELM). For controlling the effects of very large ELMs a new digital control hardware, based on the Advanced Telecommunications Computing Architecture trade mark sign (ATCA), is being developed aiming to reduce the VS digital control loop cycle (down to an optimal value of 10 {mu}s) and improve the algorithm performance. The system has 1 ATCA trade mark sign processor module and up to 12 ATCA trade mark sign control modules, each one with 32 analogmore » input channels (12 bit resolution), 4 analog output channels (12 bit resolution), and 8 digital input/output channels. The Aurora trade mark sign and PCI Express trade mark sign communication protocols will be used for data transport, between modules, with expected latencies below 2 {mu}s. Control algorithms are implemented on a ix86 based processor with 6 Gflops and on field programmable gate arrays with 80 GMACS, interconnected by serial gigabit links in a full mesh topology.« less
The Effect of NUMA Tunings on CPU Performance
NASA Astrophysics Data System (ADS)
Hollowell, Christopher; Caramarcu, Costin; Strecker-Kellogg, William; Wong, Antonio; Zaytsev, Alexandr
2015-12-01
Non-Uniform Memory Access (NUMA) is a memory architecture for symmetric multiprocessing (SMP) systems where each processor is directly connected to separate memory. Indirect access to other CPU's (remote) RAM is still possible, but such requests are slower as they must also pass through that memory's controlling CPU. In concert with a NUMA-aware operating system, the NUMA hardware architecture can help eliminate the memory performance reductions generally seen in SMP systems when multiple processors simultaneously attempt to access memory. The x86 CPU architecture has supported NUMA for a number of years. Modern operating systems such as Linux support NUMA-aware scheduling, where the OS attempts to schedule a process to the CPU directly attached to the majority of its RAM. In Linux, it is possible to further manually tune the NUMA subsystem using the numactl utility. With the release of Red Hat Enterprise Linux (RHEL) 6.3, the numad daemon became available in this distribution. This daemon monitors a system's NUMA topology and utilization, and automatically makes adjustments to optimize locality. As the number of cores in x86 servers continues to grow, efficient NUMA mappings of processes to CPUs/memory will become increasingly important. This paper gives a brief overview of NUMA, and discusses the effects of manual tunings and numad on the performance of the HEPSPEC06 benchmark, and ATLAS software.
Embedded System Implementation on FPGA System With μCLinux OS
NASA Astrophysics Data System (ADS)
Fairuz Muhd Amin, Ahmad; Aris, Ishak; Syamsul Azmir Raja Abdullah, Raja; Kalos Zakiah Sahbudin, Ratna
2011-02-01
Embedded systems are taking on more complicated tasks as the processors involved become more powerful. The embedded systems have been widely used in many areas such as in industries, automotives, medical imaging, communications, speech recognition and computer vision. The complexity requirements in hardware and software nowadays need a flexibility system for further enhancement in any design without adding new hardware. Therefore, any changes in the design system will affect the processor that need to be changed. To overcome this problem, a System On Programmable Chip (SOPC) has been designed and developed using Field Programmable Gate Array (FPGA). A softcore processor, NIOS II 32-bit RISC, which is the microprocessor core was utilized in FPGA system together with the embedded operating system(OS), μClinux. In this paper, an example of web server is explained and demonstrated
Spline-based high-accuracy piecewise-polynomial phase-to-sinusoid amplitude converters.
Petrinović, Davor; Brezović, Marko
2011-04-01
We propose a method for direct digital frequency synthesis (DDS) using a cubic spline piecewise-polynomial model for a phase-to-sinusoid amplitude converter (PSAC). This method offers maximum smoothness of the output signal. Closed-form expressions for the cubic polynomial coefficients are derived in the spectral domain and the performance analysis of the model is given in the time and frequency domains. We derive the closed-form performance bounds of such DDS using conventional metrics: rms and maximum absolute errors (MAE) and maximum spurious free dynamic range (SFDR) measured in the discrete time domain. The main advantages of the proposed PSAC are its simplicity, analytical tractability, and inherent numerical stability for high table resolutions. Detailed guidelines for a fixed-point implementation are given, based on the algebraic analysis of all quantization effects. The results are verified on 81 PSAC configurations with the output resolutions from 5 to 41 bits by using a bit-exact simulation. The VHDL implementation of a high-accuracy DDS based on the proposed PSAC with 28-bit input phase word and 32-bit output value achieves SFDR of its digital output signal between 180 and 207 dB, with a signal-to-noise ratio of 192 dB. Its implementation requires only one 18 kB block RAM and three 18-bit embedded multipliers in a typical field-programmable gate array (FPGA) device. © 2011 IEEE
Design and Test of a 65nm CMOS Front-End with Zero Dead Time for Next Generation Pixel Detectors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaioni, L.; Braga, D.; Christian, D.
This work is concerned with the experimental characterization of a synchronous analog processor with zero dead time developed in a 65 nm CMOS technology, conceived for pixel detectors at the HL-LHC experiment upgrades. It includes a low noise, fast charge sensitive amplifier with detector leakage compensation circuit, and a compact, single ended comparator able to correctly process hits belonging to two consecutive bunch crossing periods. A 2-bit Flash ADC is exploited for digital conversion immediately after the preamplifier. A description of the circuits integrated in the front-end processor and the initial characterization results are provided
Eight-Channel Digital Signal Processor and Universal Trigger Module
NASA Astrophysics Data System (ADS)
Skulski, Wojtek; Wolfs, Frank
2003-04-01
A 10-bit, 8-channel, 40 megasamples per second digital signal processor and waveform digitizer DDC-8 (nicknamed Universal Trigger Module) is presented. The digitizer features 8 analog inputs, 1 analog output for a reconstructed analog waveform, 16 NIM logic inputs, 8 NIM logic outputs, and a pool of 16 TTL logic lines which can be individually configured as either inputs or outputs. The first application of this device is to enhance the present trigger electronics for PHOBOS at RHIC. The status of the development and the first results are presented. Possible applications of the new device are discussed. Supported by the NSF grant PHY-0072204.
Hybrid quantum processors: molecular ensembles as quantum memory for solid state circuits.
Rabl, P; DeMille, D; Doyle, J M; Lukin, M D; Schoelkopf, R J; Zoller, P
2006-07-21
We investigate a hybrid quantum circuit where ensembles of cold polar molecules serve as long-lived quantum memories and optical interfaces for solid state quantum processors. The quantum memory realized by collective spin states (ensemble qubit) is coupled to a high-Q stripline cavity via microwave Raman processes. We show that, for convenient trap-surface distances of a few microm, strong coupling between the cavity and ensemble qubit can be achieved. We discuss basic quantum information protocols, including a swap from the cavity photon bus to the molecular quantum memory, and a deterministic two qubit gate. Finally, we investigate coherence properties of molecular ensemble quantum bits.
Pulse width modulation inverter with battery charger
Slicker, James M.
1985-01-01
An inverter is connected between a source of DC power and a three-phase AC induction motor, and a microprocessor-based circuit controls the inverter using pulse width modulation techniques. In the disclosed method of pulse width modulation, both edges of each pulse of a carrier pulse train are equally modulated by a time proportional to sin .theta., where .theta. is the angular displacement of the pulse center at the motor stator frequency from a fixed reference point on the carrier waveform. The carrier waveform frequency is a multiple of the motor stator frequency. The modulated pulse train is then applied to each of the motor phase inputs with respective phase shifts of 120.degree. at the stator frequency. Switching control commands for electronic switches in the inverter are stored in a random access memory (RAM) and the locations of the RAM are successively read out in a cyclic manner, each bit of a given RAM location controlling a respective phase input of the motor. The DC power source preferably comprises rechargeable batteries and all but one of the electronic switches in the inverter can be disabled, the remaining electronic switch being part of a "flyback" DC-DC converter circuit for recharging the battery.
Pulse width modulation inverter with battery charger
NASA Technical Reports Server (NTRS)
Slicker, James M. (Inventor)
1985-01-01
An inverter is connected between a source of DC power and a three-phase AC induction motor, and a microprocessor-based circuit controls the inverter using pulse width modulation techniques. In the disclosed method of pulse width modulation, both edges of each pulse of a carrier pulse train are equally modulated by a time proportional to sin .theta., where .theta. is the angular displacement of the pulse center at the motor stator frequency from a fixed reference point on the carrier waveform. The carrier waveform frequency is a multiple of the motor stator frequency. The modulated pulse train is then applied to each of the motor phase inputs with respective phase shifts of 120.degree. at the stator frequency. Switching control commands for electronic switches in the inverter are stored in a random access memory (RAM) and the locations of the RAM are successively read out in a cyclic manner, each bit of a given RAM location controlling a respective phase input of the motor. The DC power source preferably comprises rechargeable batteries and all but one of the electronic switches in the inverter can be disabled, the remaining electronic switch being part of a flyback DC-DC converter circuit for recharging the battery.
Underwater Inspection of Navigation Structures with an Acoustic Camera
2013-08-01
the camera with a slow angular speed while recording the images. 5. After the scanning has been performed, review recorded data to determine the...Core x86) or newer 2GB RAM 120GB disc space Operating system requirements Windows XP, Vista, Windows 7, 32/64 bit Java requirements Sun... Java JDK, Version 1.6, Update 16 or newer, for installation Limitations and tips for proper scanning Best results are achieved when scanning in
Combined methods of tolerance increasing for embedded SRAM
NASA Astrophysics Data System (ADS)
Shchigorev, L. A.; Shagurin, I. I.
2016-10-01
The abilities of combined use of different methods of fault tolerance increasing for SRAM such as error detection and correction codes, parity bits, and redundant elements are considered. Area penalties due to using combinations of these methods are investigated. Estimation is made for different configurations of 4K x 128 RAM memory block for 28 nm manufacturing process. Evaluation of the effectiveness of the proposed combinations is also reported. The results of these investigations can be useful for designing fault-tolerant “system on chips”.
Introducing difference recurrence relations for faster semi-global alignment of long sequences.
Suzuki, Hajime; Kasahara, Masahiro
2018-02-19
The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
Command/response protocols and concurrent software
NASA Technical Reports Server (NTRS)
Bynum, W. L.
1987-01-01
A version of the program to control the parallel jaw gripper is documented. The parallel jaw end-effector hardware and the Intel 8031 processor that is used to control the end-effector are briefly described. A general overview of the controller program is given and a complete description of the program's structure and design are contained. There are three appendices: a memory map of the on-chip RAM, a cross-reference listing of the self-scheduling routines, and a summary of the top-level and monitor commands.
Spectrophotometry (by Barbara Sawrey and Gabriele Wienhausen)
NASA Astrophysics Data System (ADS)
Pringle, David L.
1998-08-01
Science Media: San Diego, 1997. 1-10 copies, 99 each; 11-20 copies, 69 each; 21+ copies, $49 each. (Note: CD operates with both Mac and PC.) Spectrophotometry is an interactive CD-ROM which introduces the basics of UV-visible spectrophotometry with some mention of infrared and other forms of spectrophotometry. A Macintosh System 7.5 or higher, CPU 68040 or Power PC processor, 6 megabytes of free RAM, 2.6 megabytes of free disk space, and 4X CD-ROM or faster are required.
Stochastic receding horizon control: application to an octopedal robot
NASA Astrophysics Data System (ADS)
Shah, Shridhar K.; Tanner, Herbert G.
2013-06-01
Miniature autonomous systems are being developed under ARL's Micro Autonomous Systems and Technology (MAST). These systems can only be fitted with a small-size processor, and their motion behavior is inherently uncertain due to manufacturing and platform-ground interactions. One way to capture this uncertainty is through a stochastic model. This paper deals with stochastic motion control design and implementation for MAST- specific eight-legged miniature crawling robots, which have been kinematically modeled as systems exhibiting the behavior of a Dubin's car with stochastic noise. The control design takes the form of stochastic receding horizon control, and is implemented on a Gumstix Overo Fire COM with 720 MHz processor and 512 MB RAM, weighing 5.5 g. The experimental results show the effectiveness of this control law for miniature autonomous systems perturbed by stochastic noise.
MacWilliams Identity for M-Spotty Weight Enumerator
NASA Astrophysics Data System (ADS)
Suzuki, Kazuyoshi; Fujiwara, Eiji
M-spotty byte error control codes are very effective for correcting/detecting errors in semiconductor memory systems that employ recent high-density RAM chips with wide I/O data (e.g., 8, 16, or 32bits). In this case, the width of the I/O data is one byte. A spotty byte error is defined as random t-bit errors within a byte of length b bits, where 1 le t ≤ b. Then, an error is called an m-spotty byte error if at least one spotty byte error is present in a byte. M-spotty byte error control codes are characterized by the m-spotty distance, which includes the Hamming distance as a special case for t =1 or t = b. The MacWilliams identity provides the relationship between the weight distribution of a code and that of its dual code. The present paper presents the MacWilliams identity for the m-spotty weight enumerator of m-spotty byte error control codes. In addition, the present paper clarifies that the indicated identity includes the MacWilliams identity for the Hamming weight enumerator as a special case.
Critical issues regarding SEU in avionics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Normand, E.; McNulty, P.J.
1993-01-01
The energetic neutrons in the atmosphere cause microelectronics in avionic system to malfunction through a mechanism called single-event upsets (SEUs), and single-event latchup is a potential threat. Data from military and experimental flights as well as laboratory testing indicate that typical non-radiation-hardened 64K and 256K static random access memories (SRAMs) can experience a significant SEU rate at aircraft altitudes. Microelectronics in avionics systems have been demonstrated to be susceptible to SEU. Of all device types, RAMs are the most sensitive because they have the largest number of bits on a chip (e.g., an SRAM may have from 64K to 1Mmore » bits, a microprocessor 3K to 10K bits, and a logic device like an analog-to-digital converter, 12 bits). Avionics designers will need to take this susceptibility into account in current and future designs. A number of techniques are available for dealing with SEU: EDAC, redundancy, use of SEU-hard parts, reset and/or watchdog timer capability, etc. Specifications should be developed to guide avionics vendors in the analysis, prevention, and verification of neutron-induced SEU. Areas for additional research include better definition of the atmospheric neutrons and protons, development of better calculational models (e.g., those used for protons[sup 11]), and better characterization of neutron-induced latchup.« less
NASA Astrophysics Data System (ADS)
Morrison, R. E.; Robinson, S. H.
A continuous wave Doppler radar system has been designed which is portable, easily deployed, and remotely controlled. The heart of this system is a DSP/control board using Analog Devices ADSP-21020 40-bit floating point digital signal processor (DSP) microprocessor. Two 18-bit audio A/D converters provide digital input to the DSP/controller board for near real time target detection. Program memory for the DSP is dual ported with an Intel 87C51 microcontroller allowing DSP code to be up-loaded or down-loaded from a central controlling computer. The 87C51 provides overall system control for the remote radar and includes a time-of-day/day-of-year real time clock, system identification (ID) switches, and input/output (I/O) expansion by an Intel 82C55 I/O expander.
A microprocessor based on a two-dimensional semiconductor.
Wachter, Stefan; Polyushkin, Dmitry K; Bethge, Ole; Mueller, Thomas
2017-04-11
The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III-V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor-molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material.
A microprocessor based on a two-dimensional semiconductor
NASA Astrophysics Data System (ADS)
Wachter, Stefan; Polyushkin, Dmitry K.; Bethge, Ole; Mueller, Thomas
2017-04-01
The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III-V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor--molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material.
Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm
Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; ...
2018-02-12
Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. Here, we use a superconducting-qubit-based processor to apply the QSE approach to the H 2 molecule, extracting both groundmore » and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.« less
Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm
NASA Astrophysics Data System (ADS)
Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; Blok, M. S.; Kimchi-Schwartz, M. E.; McClean, J. R.; Carter, J.; de Jong, W. A.; Siddiqi, I.
2018-02-01
Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. We use a superconducting-qubit-based processor to apply the QSE approach to the H2 molecule, extracting both ground and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.
Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Colless, J. I.; Ramasesh, V. V.; Dahlen, D.
Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. Here, we use a superconducting-qubit-based processor to apply the QSE approach to the H 2 molecule, extracting both groundmore » and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.« less
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
FPGA wavelet processor design using language for instruction-set architectures (LISA)
NASA Astrophysics Data System (ADS)
Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios
2007-04-01
The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.
NASA Astrophysics Data System (ADS)
Sourbier, F.; Operto, S.; Virieux, J.
2006-12-01
We present a distributed-memory parallel algorithm for 2D visco-acoustic full-waveform inversion of wide-angle seismic data. Our code is written in fortran90 and use MPI for parallelism. The algorithm was applied to real wide-angle data set recorded by 100 OBSs with a 1-km spacing in the eastern-Nankai trough (Japan) to image the deep structure of the subduction zone. Full-waveform inversion is applied sequentially to discrete frequencies by proceeding from the low to the high frequencies. The inverse problem is solved with a classic gradient method. Full-waveform modeling is performed with a frequency-domain finite-difference method. In the frequency-domain, solving the wave equation requires resolution of a large unsymmetric system of linear equations. We use the massively parallel direct solver MUMPS (http://www.enseeiht.fr/irit/apo/MUMPS) for distributed-memory computer to solve this system. The MUMPS solver is based on a multifrontal method for the parallel factorization. The MUMPS algorithm is subdivided in 3 main steps: a symbolic analysis step that performs re-ordering of the matrix coefficients to minimize the fill-in of the matrix during the subsequent factorization and an estimation of the assembly tree of the matrix. Second, the factorization is performed with dynamic scheduling to accomodate numerical pivoting and provides the LU factors distributed over all the processors. Third, the resolution is performed for multiple sources. To compute the gradient of the cost function, 2 simulations per shot are required (one to compute the forward wavefield and one to back-propagate residuals). The multi-source resolutions can be performed in parallel with MUMPS. In the end, each processor stores in core a sub-domain of all the solutions. These distributed solutions can be exploited to compute in parallel the gradient of the cost function. Since the gradient of the cost function is a weighted stack of the shot and residual solutions of MUMPS, each processor computes the corresponding sub-domain of the gradient. In the end, the gradient is centralized on the master processor using a collective communation. The gradient is scaled by the diagonal elements of the Hessian matrix. This scaling is computed only once per frequency before the first iteration of the inversion. Estimation of the diagonal terms of the Hessian requires performing one simulation per non redondant shot and receiver position. The same strategy that the one used for the gradient is used to compute the diagonal Hessian in parallel. This algorithm was applied to a dense wide-angle data set recorded by 100 OBSs in the eastern Nankai trough, offshore Japan. Thirteen frequencies ranging from 3 and 15 Hz were inverted. Tweny iterations per frequency were computed leading to 260 tomographic velocity models of increasing resolution. The velocity model dimensions are 105 km x 25 km corresponding to a finite-difference grid of 4201 x 1001 grid with a 25-m grid interval. The number of shot was 1005 and the number of inverted OBS gathers was 93. The inversion requires 20 days on 6 32-bits bi-processor nodes with 4 Gbytes of RAM memory per node when only the LU factorization is performed in parallel. Preliminary estimations of the time required to perform the inversion with the fully-parallelized code is 6 and 4 days using 20 and 50 processors respectively.
NASA Astrophysics Data System (ADS)
Kim, Seung-Tae; Cho, Won-Ju
2018-01-01
We fabricated a resistive random access memory (ReRAM) device on a Ti/AlO x /Pt structure with solution-processed AlO x switching layer using microwave irradiation (MWI), and demonstrated multi-level cell (MLC) operation. To investigate the effect of MWI power on the MLC characteristics, post-deposition annealing was performed at 600-3000 W after AlO x switching layer deposition, and the MLC operation was compared with as-deposited (as-dep) and conventional thermally annealing (CTA) treated devices. All solution-processed AlO x -based ReRAM devices exhibited bipolar resistive switching (BRS) behavior. We found that these devices have four-resistance states (2 bits) of MLC operation according to the modulation of the high-resistance state (HRSs) through reset voltage control. Particularly, compared to the as-dep and CTA ReRAM devices, the MWI-treated ReRAM devices showed a significant increase in the memory window and stable endurance for multi-level operation. Moreover, as the MWI power increased, excellent MLC characteristics were exhibited because the resistance ratio between each resistance state was increased. In addition, it exhibited reliable retention characteristics without deterioration at 25 °C and 85 °C for 10 000 s. Finally, the relationship between the chemical characteristics of the solution-processed AlO x switching layer and BRS-based multi-level operation according to the annealing method and MWI power was investigated using x-ray photoelectron spectroscopy.
On VLSI Design of Rank-Order Filtering using DCRAM Architecture
Lin, Meng-Chun; Dung, Lan-Rong
2009-01-01
This paper addresses on VLSI design of rank-order filtering (ROF) with a maskable memory for real-time speech and image processing applications. Based on a generic bit-sliced ROF algorithm, the proposed design uses a special-defined memory, called the dual-cell random-access memory (DCRAM), to realize major operations of ROF: threshold decomposition and polarization. Using the memory-oriented architecture, the proposed ROF processor can benefit from high flexibility, low cost and high speed. The DCRAM can perform the bit-sliced read, partial write, and pipelined processing. The bit-sliced read and partial write are driven by maskable registers. With recursive execution of the bit-slicing read and partial write, the DCRAM can effectively realize ROF in terms of cost and speed. The proposed design has been implemented using TSMC 0.18 μm 1P6M technology. As shown in the result of physical implementation, the core size is 356.1 × 427.7μm2 and the VLSI implementation of ROF can operate at 256 MHz for 1.8V supply. PMID:19865599
A low power biomedical signal processor ASIC based on hardware software codesign.
Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T
2009-01-01
A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors
NASA Astrophysics Data System (ADS)
Yi, Hongsuk
2014-03-01
Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
ERIC Educational Resources Information Center
Marcovitz, Alan B., Ed.
This paper describes an introductory course in microprocessors and microcomputers implemented at Grossmont College. The current state-of-the-art in the microprocessor field is discussed, with special emphasis on the 8-bit MOS single-chip processors which are the most commonly used devices. Objectives and guidelines for the course are presented,…
2008-07-31
Unlike the Lyrtech, each DSP on a Bittware board offers 3 MB of on-chip memory and 3 GFLOPs of 32-bit peak processing power. Based on the performance...Each NVIDIA 8800 Ultra features 576 GFLOPS on 128 612-MHz single-precision floating-point SIMD processors, arranged in 16 clusters of eight. Each
Mark IVA microprocessor support
NASA Technical Reports Server (NTRS)
Burford, A. L.
1982-01-01
The requirements and plans for the maintenance support of microprocessor-based controllers in the Deep Space Network Mark IVA System are discussed. Additional new interfaces and 16-bit processors have introduced problems not present in the Mark III System. The need for continuous training of maintenance personnel to maintain a level of expertise consistent with the sophistication of the required tools is also emphasized.
Fault-Tolerant Computing: An Overview
1991-06-01
Addison Wesley:, Reading, MA) 1984. [8] J. Wakerly , Error Detecting Codes, Self-Checking Circuits and Applications , (Elsevier North Holland, Inc.- New York... applicable to bit-sliced organi- zations of hardware. In the first time step, the normal computation is performed on the operands and the results...for error detection and fault tolerance in parallel processor systems while perform- ing specific computation-intensive applications [111. Contrary to
NASA Astrophysics Data System (ADS)
Odinokov, S. B.; Petrov, A. V.
1995-10-01
Mathematical models of components of a vector-matrix optoelectronic multiplier are considered. Perturbing factors influencing a real optoelectronic system — noise and errors of radiation sources and detectors, nonlinearity of an analogue—digital converter, nonideal optical systems — are taken into account. Analytic expressions are obtained for relating the precision of such a multiplier to the probability of an error amounting to one bit, to the parameters describing the quality of the multiplier components, and to the quality of the optical system of the processor. Various methods of increasing the dynamic range of a multiplier are considered at the technical systems level.
Integrated optical circuits for numerical computation
NASA Technical Reports Server (NTRS)
Verber, C. M.; Kenan, R. P.
1983-01-01
The development of integrated optical circuits (IOC) for numerical-computation applications is reviewed, with a focus on the use of systolic architectures. The basic architecture criteria for optical processors are shown to be the same as those proposed by Kung (1982) for VLSI design, and the advantages of IOCs over bulk techniques are indicated. The operation and fabrication of electrooptic grating structures are outlined, and the application of IOCs of this type to an existing 32-bit, 32-Mbit/sec digital correlator, a proposed matrix multiplier, and a proposed pipeline processor for polynomial evaluation is discussed. The problems arising from the inherent nonlinearity of electrooptic gratings are considered. Diagrams and drawings of the application concepts are provided.
Experience with custom processors in space flight applications
NASA Technical Reports Server (NTRS)
Fraeman, M. E.; Hayes, J. R.; Lohr, D. A.; Ballard, B. W.; Williams, R. L.; Henshaw, R. M.
1991-01-01
The Applied Physics Laboratory (APL) has developed a magnetometer instrument for a swedish satellite named Freja with launch scheduled for August 1992 on a Chinese Long March rocket. The magnetometer controller utilized a custom microprocessor designed at APL with the Genesil silicon compiler. The processor evolved from our experience with an older bit-slice design and two prior single chip efforts. The architecture of our microprocessor greatly lowered software development costs because it was optimized to provide an interactive and extensible programming environment hosted by the target hardware. Radiation tolerance of the microprocessor was also tested and was adequate for Freja's mission -- 20 kRad(Si) total dose and very infrequent latch-up and single event upset events.
Embedded neural recording with TinyOS-based wireless-enabled processor modules.
Farshchi, Shahin; Pesterev, Aleksey; Nuyujukian, Paul; Guenterberg, Eric; Mody, Istvan; Judy, Jack W
2010-04-01
To create a wireless neural recording system that can benefit from the continuous advancements being made in embedded microcontroller and communications technologies, an embedded-system-based architecture for wireless neural recording has been designed, fabricated, and tested. The system consists of commercial-off-the-shelf wireless-enabled processor modules (motes) for communicating the neural signals, and a back-end database server and client application for archiving and browsing the neural signals. A neural-signal-acquisition application has been developed to enable the mote to either acquire neural signals at a rate of 4000 12-bit samples per second, or detect and transmit spike heights and widths sampled at a rate of 16670 12-bit samples per second on a single channel. The motes acquire neural signals via a custom low-noise neural-signal amplifier with adjustable gain and high-pass corner frequency that has been designed, and fabricated in a 1.5-microm CMOS process. In addition to browsing acquired neural data, the client application enables the user to remotely toggle modes of operation (real-time or spike-only), as well as amplifier gain and high-pass corner frequency.
NASA Astrophysics Data System (ADS)
Passas, Georgios; Freear, Steven; Fawcett, Darren
2010-08-01
Orthogonal frequency division multiplexing (OFDM)-based feed-forward space-time trellis code (FFSTTC) encoders can be synthesised as very high speed integrated circuit hardware description language (VHDL) designs. Evaluation of their FPGA implementation can lead to conclusions that help a designer to decide the optimum implementation, given the encoder structural parameters. VLSI architectures based on 1-bit multipliers and look-up tables (LUTs) are compared in terms of FPGA slices and block RAMs (area), as well as in terms of minimum clock period (speed). Area and speed graphs versus encoder memory order are provided for quadrature phase shift keying (QPSK) and 8 phase shift keying (8-PSK) modulation and two transmit antennas, revealing best implementation under these conditions. The effect of number of modulation bits and transmit antennas on the encoder implementation complexity is also investigated.
Ultra-Reliable Digital Avionics (URDA) processor
NASA Astrophysics Data System (ADS)
Branstetter, Reagan; Ruszczyk, William; Miville, Frank
1994-10-01
Texas Instruments Incorporated (TI) developed the URDA processor design under contract with the U.S. Air Force Wright Laboratory and the U.S. Army Night Vision and Electro-Sensors Directorate. TI's approach couples advanced packaging solutions with advanced integrated circuit (IC) technology to provide a high-performance (200 MIPS/800 MFLOPS) modular avionics processor module for a wide range of avionics applications. TI's processor design integrates two Ada-programmable, URDA basic processor modules (BPM's) with a JIAWG-compatible PiBus and TMBus on a single F-22 common integrated processor-compatible form-factor SEM-E avionics card. A separate, high-speed (25-MWord/second 32-bit word) input/output bus is provided for sensor data. Each BPM provides a peak throughput of 100 MIPS scalar concurrent with 400-MFLOPS vector processing in a removable multichip module (MCM) mounted to a liquid-flowthrough (LFT) core and interfacing to a processor interface module printed wiring board (PWB). Commercial RISC technology coupled with TI's advanced bipolar complementary metal oxide semiconductor (BiCMOS) application specific integrated circuit (ASIC) and silicon-on-silicon packaging technologies are used to achieve the high performance in a miniaturized package. A Mips R4000-family reduced instruction set computer (RISC) processor and a TI 100-MHz BiCMOS vector coprocessor (VCP) ASIC provide, respectively, the 100 MIPS of a scalar processor throughput and 400 MFLOPS of vector processing throughput for each BPM. The TI Aladdim ASIC chipset was developed on the TI Aladdin Program under contract with the U.S. Army Communications and Electronics Command and was sponsored by the Advanced Research Projects Agency with technical direction from the U.S. Army Night Vision and Electro-Sensors Directorate.
STAR: FPGA-based software defined satellite transponder
NASA Astrophysics Data System (ADS)
Davalle, Daniele; Cassettari, Riccardo; Saponara, Sergio; Fanucci, Luca; Cucchi, Luca; Bigongiari, Franco; Errico, Walter
2013-05-01
This paper presents STAR, a flexible Telemetry, Tracking & Command (TT&C) transponder for Earth Observation (EO) small satellites, developed in collaboration with INTECS and SITAEL companies. With respect to state-of-the-art EO transponders, STAR includes the possibility of scientific data transfer thanks to the 40 Mbps downlink data-rate. This feature represents an important optimization in terms of hardware mass, which is important for EO small satellites. Furthermore, in-flight re-configurability of communication parameters via telecommand is important for in-orbit link optimization, which is especially useful for low orbit satellites where visibility can be as short as few hundreds of seconds. STAR exploits the principles of digital radio to minimize the analog section of the transceiver. 70MHz intermediate frequency (IF) is the interface with an external S/X band radio-frequency front-end. The system is composed of a dedicated configurable high-speed digital signal processing part, the Signal Processor (SP), described in technology-independent VHDL working with a clock frequency of 184.32MHz and a low speed control part, the Control Processor (CP), based on the 32-bit Gaisler LEON3 processor clocked at 32 MHz, with SpaceWire and CAN interfaces. The quantization parameters were fine-tailored to reach a trade-off between hardware complexity and implementation loss which is less than 0.5 dB at BER = 10-5 for the RX chain. The IF ports require 8-bit precision. The system prototype is fitted on the Xilinx Virtex 6 VLX75T-FF484 FPGA of which a space-qualified version has been announced. The total device occupation is 82 %.
QCA Gray Code Converter Circuits Using LTEx Methodology
NASA Astrophysics Data System (ADS)
Mukherjee, Chiradeep; Panda, Saradindu; Mukhopadhyay, Asish Kumar; Maji, Bansibadan
2018-07-01
The Quantum-dot Cellular Automata (QCA) is the prominent paradigm of nanotechnology considered to continue the computation at deep sub-micron regime. The QCA realizations of several multilevel circuit of arithmetic logic unit have been introduced in the recent years. However, as high fan-in Binary to Gray (B2G) and Gray to Binary (G2B) Converters exist in the processor based architecture, no attention has been paid towards the QCA instantiation of the Gray Code Converters which are anticipated to be used in 8-bit, 16-bit, 32-bit or even more bit addressable machines of Gray Code Addressing schemes. In this work the two-input Layered T module is presented to exploit the operation of an Exclusive-OR Gate (namely LTEx module) as an elemental block. The "defect-tolerant analysis" of the two-input LTEx module has been analyzed to establish the scalability and reproducibility of the LTEx module in the complex circuits. The novel formulations exploiting the operability of the LTEx module have been proposed to instantiate area-delay efficient B2G and G2B Converters which can be exclusively used in Gray Code Addressing schemes. Moreover this work formulates the QCA design metrics such as O-Cost, Effective area, Delay and Cost α for the n-bit converter layouts.
QCA Gray Code Converter Circuits Using LTEx Methodology
NASA Astrophysics Data System (ADS)
Mukherjee, Chiradeep; Panda, Saradindu; Mukhopadhyay, Asish Kumar; Maji, Bansibadan
2018-04-01
The Quantum-dot Cellular Automata (QCA) is the prominent paradigm of nanotechnology considered to continue the computation at deep sub-micron regime. The QCA realizations of several multilevel circuit of arithmetic logic unit have been introduced in the recent years. However, as high fan-in Binary to Gray (B2G) and Gray to Binary (G2B) Converters exist in the processor based architecture, no attention has been paid towards the QCA instantiation of the Gray Code Converters which are anticipated to be used in 8-bit, 16-bit, 32-bit or even more bit addressable machines of Gray Code Addressing schemes. In this work the two-input Layered T module is presented to exploit the operation of an Exclusive-OR Gate (namely LTEx module) as an elemental block. The "defect-tolerant analysis" of the two-input LTEx module has been analyzed to establish the scalability and reproducibility of the LTEx module in the complex circuits. The novel formulations exploiting the operability of the LTEx module have been proposed to instantiate area-delay efficient B2G and G2B Converters which can be exclusively used in Gray Code Addressing schemes. Moreover this work formulates the QCA design metrics such as O-Cost, Effective area, Delay and Cost α for the n-bit converter layouts.
Chowkidar: A Health Monitor for Wireless Sensor Network Testbeds
2006-02-01
a new set of parameters, likely to give better results, Device type XSM TelosB Stargate Processor 4MHz 8MHz 400MHz RAM 4KB 10KB 32MB OS TinyOS TinyOS...mote being unavailable for user experimentation. However, the fail-stop of a Stargate has much more impact since a Stargate is used by Kansei to...results in loss of wired connectivity to all of its attached Stargates and in turn their attached motes. Since the wired network is used by Kansei for
Real-Time Acquisition and Processing System (RTAPS) Version 1.1 Installation and User’s Manual.
1986-08-01
The language is incrementally compiled and procedure-oriented. It is run on an 8088 processor with 56K of available user RAM. The master board features...RTAPS/PC computers. The wiring configuration is shown in figure 10. Switch Modem Port MAC P5 or P6* 2, B4 3 B8 1%7 1 B10 *P6 recommended Figure 10. $MAC...activated switch. The AXAC output port is physically connected to the modem input on the switch. The subchannels are the labeled terminal connections
NASA Technical Reports Server (NTRS)
Green, D. M.
1978-01-01
Software programs are described, one which implements a voltage regulation function, and one which implements a charger function with peak-power tracking of its input. The software, written in modular fashion, is intended as a vehicle for further experimentation with the P-3 system. A control teleprinter allows an operator to make parameter modifications to the control algorithm during experiments. The programs require 3K ROM and 2K ram each. User manuals for each system are included as well as a third program for simple I/O control.
Advanced Physiological Estimation of Cognitive Status. Part 2
2011-05-24
Neurofeedback Algorithms and Gaze Controller EEG Sensor System g.USBamp *, ** • internal 24-bit ADC and digital signal processor • 16 channels (expandable...SUBJECT TERMS EEG eye-tracking mental state estimation machine learning Leonard J. Trejo Pacific Development and Technology LLC 999 Commercial St. Palo...fatigue, overload) Technology Transfer Opportunity Technology from PDT – Methods to acquire various physiological signals ( EEG , EOG, EMG, ECG, etc
MOBS - A modular on-board switching system
NASA Astrophysics Data System (ADS)
Berner, W.; Grassmann, W.; Piontek, M.
The authors describe a multibeam satellite system that is designed for business services and for communications at a high bit rate. The repeater is regenerative with a modular onboard switching system. It acts not only as baseband switch but also as the central node of the network, performing network control and protocol evaluation. The hardware is based on a modular bus/memory architecture with associated processors.
GR712RC- Dual-Core Processor- Product Status
NASA Astrophysics Data System (ADS)
Sturesson, Fredrik; Habinc, Sandi; Gaisler, Jiri
2012-08-01
The GR712RC System-on-Chip (SoC) is a dual core LEON3FT system suitable for advanced high reliability space avionics. Fault tolerance features from Aeroflex Gaisler’s GRLIB IP library and an implementation using Ramon Chips RadSafe cell library enables superior radiation hardness.The GR712RC device has been designed to provide high processing power by including two LEON3FT 32- bit SPARC V8 processors, each with its own high- performance IEEE754 compliant floating-point-unit and SPARC reference memory management unit.This high processing power is combined with a large number of serial interfaces, ranging from high-speed links for data transfers to low-speed control buses for commanding and status acquisition.
A high-speed, large-capacity, 'jukebox' optical disk system
NASA Technical Reports Server (NTRS)
Ammon, G. J.; Calabria, J. A.; Thomas, D. T.
1985-01-01
Two optical disk 'jukebox' mass storage systems which provide access to any data in a store of 10 to the 13th bits (1250G bytes) within six seconds have been developed. The optical disk jukebox system is divided into two units, including a hardware/software controller and a disk drive. The controller provides flexibility and adaptability, through a ROM-based microcode-driven data processor and a ROM-based software-driven control processor. The cartridge storage module contains 125 optical disks housed in protective cartridges. Attention is given to a conceptual view of the disk drive unit, the NASA optical disk system, the NASA database management system configuration, the NASA optical disk system interface, and an open systems interconnect reference model.
High-performance multiprocessor architecture for a 3-D lattice gas model
NASA Technical Reports Server (NTRS)
Lee, F.; Flynn, M.; Morf, M.
1991-01-01
The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
Radio Astronomy Software Defined Receiver Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vacaliuc, Bogdan; Leech, Marcus; Oxley, Paul
The paper describes a Radio Astronomy Software Defined Receiver (RASDR) that is currently under development. RASDR is targeted for use by amateurs and small institutions where cost is a primary consideration. The receiver will operate from HF thru 2.8 GHz. Front-end components such as preamps, block down-converters and pre-select bandpass filters are outside the scope of this development and will be provided by the user. The receiver includes RF amplifiers and attenuators, synthesized LOs, quadrature down converters, dual 8 bit ADCs and a Signal Processor that provides firmware processing of the digital bit stream. RASDR will interface to a usermore » s PC via a USB or higher speed Ethernet LAN connection. The PC will run software that provides processing of the bit stream, a graphical user interface, as well as data analysis and storage. Software should support MAC OS, Windows and Linux platforms and will focus on such radio astronomy applications as total power measurements, pulsar detection, and spectral line studies.« less
A microprocessor based on a two-dimensional semiconductor
Wachter, Stefan; Polyushkin, Dmitry K.; Bethge, Ole; Mueller, Thomas
2017-01-01
The advent of microcomputers in the 1970s has dramatically changed our society. Since then, microprocessors have been made almost exclusively from silicon, but the ever-increasing demand for higher integration density and speed, lower power consumption and better integrability with everyday goods has prompted the search for alternatives. Germanium and III–V compound semiconductors are being considered promising candidates for future high-performance processor generations and chips based on thin-film plastic technology or carbon nanotubes could allow for embedding electronic intelligence into arbitrary objects for the Internet-of-Things. Here, we present a 1-bit implementation of a microprocessor using a two-dimensional semiconductor—molybdenum disulfide. The device can execute user-defined programs stored in an external memory, perform logical operations and communicate with its periphery. Our 1-bit design is readily scalable to multi-bit data. The device consists of 115 transistors and constitutes the most complex circuitry so far made from a two-dimensional material. PMID:28398336
NASA Technical Reports Server (NTRS)
Phyne, J. R.; Nelson, M. D.
1975-01-01
The design and implementation of hardware and software systems involved in using a 40,000 bit/second communication line as the connecting link between an IMLAC PDS 1-D display computer and a Univac 1108 computer system were described. The IMLAC consists of two independent processors sharing a common memory. The display processor generates the deflection and beam control currents as it interprets a program contained in the memory; the minicomputer has a general instruction set and is responsible for starting and stopping the display processor and for communicating with the outside world through the keyboard, teletype, light pen, and communication line. The processing time associated with each data byte was minimized by designing the input and output processes as finite state machines which automatically sequence from each state to the next. Several tests of the communication link and the IMLAC software were made using a special low capacity computer grade cable between the IMLAC and the Univac.
High precision computing with charge domain devices and a pseudo-spectral method therefor
NASA Technical Reports Server (NTRS)
Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor); Fijany, Amir (Inventor); Zak, Michail (Inventor)
1997-01-01
The present invention enhances the bit resolution of a CCD/CID MVM processor by storing each bit of each matrix element as a separate CCD charge packet. The bits of each input vector are separately multiplied by each bit of each matrix element in massive parallelism and the resulting products are combined appropriately to synthesize the correct product. In another aspect of the invention, such arrays are employed in a pseudo-spectral method of the invention, in which partial differential equations are solved by expressing each derivative analytically as matrices, and the state function is updated at each computation cycle by multiplying it by the matrices. The matrices are treated as synaptic arrays of a neural network and the state function vector elements are treated as neurons. In a further aspect of the invention, moving target detection is performed by driving the soliton equation with a vector of detector outputs. The neural architecture consists of two synaptic arrays corresponding to the two differential terms of the soliton-equation and an adder connected to the output thereof and to the output of the detector array to drive the soliton equation.
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
This technical report contains the HOL listings of the specification of the design and major portions of the requirements for a commercially developed processor interface unit (or PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU specification as it currently exists. Section two of this report contains general-purpose HOL theories that support the PIU specification. These theories include definitions for the hardware components used in the PIU, our implementation of bit words, and our implementation of temporal logic. Section three contains the HOL listings for the PIU design specification. Aside from the PIU internal bus (I-Bus), this specification is complete. Section four contains the HOL listings for a major portion of the PIU requirements specification. Specifically, it contains most of the definition for the PIU behavior associated with memory accesses initiated by the local processor.
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories
1989-02-01
frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
Microdot - A Four-Bit Microcontroller Designed for Distributed Low-End Computing in Satellites
NASA Astrophysics Data System (ADS)
2002-03-01
Many satellites are an integrated collection of sensors and actuators that require dedicated real-time control. For single processor systems, additional sensors require an increase in computing power and speed to provide the multi-tasking capability needed to service each sensor. Faster processors cost more and consume more power, which taxes a satellite's power resources and may lead to shorter satellite lifetimes. An alternative design approach is a distributed network of small and low power microcontrollers designed for space that handle the computing requirements of each individual sensor and actuator. The design of microdot, a four-bit microcontroller for distributed low-end computing, is presented. The design is based on previous research completed at the Space Electronics Branch, Air Force Research Laboratory (AFRL/VSSE) at Kirtland AFB, NM, and the Air Force Institute of Technology at Wright-Patterson AFB, OH. The Microdot has 29 instructions and a 1K x 4 instruction memory. The distributed computing architecture is based on the Philips Semiconductor I2C Serial Bus Protocol. A prototype was implemented and tested using an Altera Field Programmable Gate Array (FPGA). The prototype was operable to 9.1 MHz. The design was targeted for fabrication in a radiation-hardened-by-design gate-array cell library for the TSMC 0.35 micrometer CMOS process.
NASA Astrophysics Data System (ADS)
Coffey, Stephen; Connell, Joseph
2005-06-01
This paper presents a development platform for real-time image processing based on the ADSP-BF533 Blackfin processor and the MicroC/OS-II real-time operating system (RTOS). MicroC/OS-II is a completely portable, ROMable, pre-emptive, real-time kernel. The Blackfin Digital Signal Processors (DSPs), incorporating the Analog Devices/Intel Micro Signal Architecture (MSA), are a broad family of 16-bit fixed-point products with a dual Multiply Accumulate (MAC) core. In addition, they have a rich instruction set with variable instruction length and both DSP and MCU functionality thus making them ideal for media based applications. Using the MicroC/OS-II for task scheduling and management, the proposed system can capture and process raw RGB data from any standard 8-bit greyscale image sensor in soft real-time and then display the processed result using a simple PC graphical user interface (GUI). Additionally, the GUI allows configuration of the image capture rate and the system and core DSP clock rates thereby allowing connectivity to a selection of image sensors and memory devices. The GUI also allows selection from a set of image processing algorithms based in the embedded operating system.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Design of an anti-Rician-fading modem for mobile satellite communication systems
NASA Technical Reports Server (NTRS)
Kojima, Toshiharu; Ishizu, Fumio; Miyake, Makoto; Murakami, Keishi; Fujino, Tadashi
1995-01-01
To design a demodulator applicable to mobile satellite communication systems using differential phase shift keying modulation, we have developed key technologies including an anti-Rician-fading demodulation scheme, an initial acquisition scheme, automatic gain control (AGC), automatic frequency control (AFC), and bit timing recovery (BTR). Using these technologies, we have developed one-chip digital signal processor (DSP) modem for mobile terminal, which is compact, of light weight, and of low power consumption. Results of performance test show that the developed DSP modem achieves good performance in terms of bit error ratio in mobile satellite communication environment, i.e., Rician fading channel. It is also shown that the initial acquisition scheme acquires received signal rapidly even if the carrier-to-noise power ratio (CNR) of the received signal is considerably low.
The design of infrared information collection circuit based on embedded technology
NASA Astrophysics Data System (ADS)
Liu, Haoting; Zhang, Yicong
2013-07-01
S3C2410 processor is a 16/32 bit RISC embedded processor which based on ARM920T core and AMNA bus, and mainly for handheld devices, and high cost, low-power applications. This design introduces a design plan of the PIR sensor system, circuit and its assembling, debugging. The Application Circuit of the passive PIR alarm uses the invisibility of the infrared radiation well into the alarm system, and in order to achieve the anti-theft alarm and security purposes. When the body goes into the range of PIR sensor detection, sensors will detect heat sources and then the sensor will output a weak signal. The Signal should be amplified, compared and delayed; finally light emitting diodes emit light, playing the role of a police alarm.
Frequency domain laser velocimeter signal processor: A new signal processing scheme
NASA Technical Reports Server (NTRS)
Meyers, James F.; Clemmons, James I., Jr.
1987-01-01
A new scheme for processing signals from laser velocimeter systems is described. The technique utilizes the capabilities of advanced digital electronics to yield a smart instrument that is able to configure itself, based on the characteristics of the input signals, for optimum measurement accuracy. The signal processor is composed of a high-speed 2-bit transient recorder for signal capture and a combination of adaptive digital filters with energy and/or zero crossing detection signal processing. The system is designed to accept signals with frequencies up to 100 MHz with standard deviations up to 20 percent of the average signal frequency. Results from comparative simulation studies indicate measurement accuracies 2.5 times better than with a high-speed burst counter, from signals with as few as 150 photons per burst.
2012-06-14
Display 480 x 800 pixels (3.7 inches) CPU Qualcomm QSD8250 1GHz Memory (internal) 512MB RAM / 512 MB ROM Kernel version 2.6.35.7-ge0fb012 Figure 3.5: HTC...development and writing). The 34 MSM kernel provided by the AOSP and compatible with the HTC Nexus One’s motherboard and Qualcomm chipset, is used for this...building the kernel is having the prebuilt toolchains and the right kernel for the hardware. Many HTC products use Qualcomm processors which uses the
Advanced satellite communication system
NASA Technical Reports Server (NTRS)
Staples, Edward J.; Lie, Sen
1992-01-01
The objective of this research program was to develop an innovative advanced satellite receiver/demodulator utilizing surface acoustic wave (SAW) chirp transform processor and coherent BPSK demodulation. The algorithm of this SAW chirp Fourier transformer is of the Convolve - Multiply - Convolve (CMC) type, utilizing off-the-shelf reflective array compressor (RAC) chirp filters. This satellite receiver, if fully developed, was intended to be used as an on-board multichannel communications repeater. The Advanced Communications Receiver consists of four units: (1) CMC processor, (2) single sideband modulator, (3) demodulator, and (4) chirp waveform generator and individual channel processors. The input signal is composed of multiple user transmission frequencies operating independently from remotely located ground terminals. This signal is Fourier transformed by the CMC Processor into a unique time slot for each user frequency. The CMC processor is driven by a waveform generator through a single sideband (SSB) modulator. The output of the coherent demodulator is composed of positive and negative pulses, which are the envelopes of the chirp transform processor output. These pulses correspond to the data symbols. Following the demodulator, a logic circuit reconstructs the pulses into data, which are subsequently differentially decoded to form the transmitted data. The coherent demodulation and detection of BPSK signals derived from a CMC chirp transform processor were experimentally demonstrated and bit error rate (BER) testing was performed. To assess the feasibility of such advanced receiver, the results were compared with the theoretical analysis and plotted for an average BER as a function of signal-to-noise ratio. Another goal of this SBIR program was the development of a commercial product. The commercial product developed was an arbitrary waveform generator. The successful sales have begun with the delivery of the first arbitrary waveform generator.
Design and implementation of power efficient 10-bit dual port SRAM on 28 nm technology
NASA Astrophysics Data System (ADS)
Gulati, Anmol; Gupta, Ashutosh; Murgai, Shruti; Bhaskar, Lala
2016-03-01
In this paper, 10 bit synchronous clock gated Dual port RAM has been designed. The negative latch based clock gating technique has been employed to optimize the power of the design. The design has been implemented on XV7K70T device, -3 speed grade, and kintex 7 FPGA family on Xilinx ISE Design Suite 14.7 using 28 nm technology. The design has been synthesized using Verilog HDL. We have been successful in achieving approximately 55 % reduction in total clock power, 81.55% reduction in BRAM power, 82.65%, 0.07%, 1.04% and 11.31% reduction in static power, 72.32%, 38.60%, 68.74% and 71.97%, reduction in dynamic power and 72.44%, 16.96%, 60.88% and 71.06% reduction in total supply power at 1 THz, 1GHz, 100 GHz and 1000 GHz frequency respectively. The power of the device has been calculated using XPower Analyzer tool of Xilinx ISE Design Suite 14.7.
NASA Astrophysics Data System (ADS)
Glatter, Otto; Fuchs, Heribert; Jorde, Christian; Eigner, Wolf-Dieter
1987-03-01
The microprocessor of an 8-bit PC system is used as a central control unit for the acquisition and evaluation of data from quasi-elastic light scattering experiments. Data are sampled with a width of 8 bits under control of the CPU. This limits the minimum sample time to 20 μs. Shorter sample times would need a direct memory access channel. The 8-bit CPU can address a 64-kbyte RAM without additional paging. Up to 49 000 sample points can be measured without interruption. After storage, a correlation function or a power spectrum can be calculated from such a primary data set. Furthermore access is provided to the primary data for stability control, statistical tests, and for comparison of different evaluation methods for the same experiment. A detailed analysis of the signal (histogram) and of the effect of overflows is possible and shows that the number of pulses but not the number of overflows determines the error in the result. The correlation function can be computed with reasonable accuracy from data with a mean pulse rate greater than one, the power spectrum needs a three times higher pulse rate for convergence. The statistical accuracy of the results from 49 000 sample points is of the order of a few percent. Additional averages are necessary to improve their quality. The hardware extensions for the PC system are inexpensive. The main disadvantage of the present system is the high minimum sampling time of 20 μs and the fact that the correlogram or the power spectrum cannot be computed on-line as it can be done with hardware correlators or spectrum analyzers. These shortcomings and the storage size restrictions can be removed with a faster 16/32-bit CPU.
Development of the SEASIS instrument for SEDSAT
NASA Technical Reports Server (NTRS)
Maier, Mark W.
1996-01-01
Two SEASIS experiment objectives are key: take images that allow three axis attitude determination and take multi-spectral images of the earth. During the tether mission it is also desirable to capture images for the recoiling tether from the endmass perspective (which has never been observed). SEASIS must store all its imagery taken during the tether mission until the earth downlink can be established. SEASIS determines attitude with a panoramic camera and performs earth observation with a telephoto lens camera. Camera video is digitized, compressed, and stored in solid state memory. These objectives are addressed through the following architectural choices: (1) A camera system using a Panoramic Annular Lens (PAL). This lens has a 360 deg. azimuthal field of view by a +45 degree vertical field measured from a plan normal to the lens boresight axis. It has been shown in Mr. Mark Steadham's UAH M.S. thesis that his camera can determine three axis attitude anytime the earth and one other recognizable celestial object (for example, the sun) is in the field of view. This will be essentially all the time during tether deployment. (2) A second camera system using telephoto lens and filter wheel. The camera is a black and white standard video camera. The filters are chosen to cover the visible spectral bands of remote sensing interest. (3) A processor and mass memory arrangement linked to the cameras. Video signals from the cameras are digitized, compressed in the processor, and stored in a large static RAM bank. The processor is a multi-chip module consisting of a T800 Transputer and three Zoran floating point Digital Signal Processors. This processor module was supplied under ARPA contract by the Space Computer Corporation to demonstrate its use in space.
A new version of the CADNA library for estimating round-off error propagation in Fortran programs
NASA Astrophysics Data System (ADS)
Jézéquel, Fabienne; Chesneaux, Jean-Marie; Lamotte, Jean-Luc
2010-11-01
The CADNA library enables one to estimate, using a probabilistic approach, round-off error propagation in any simulation program. CADNA provides new numerical types, the so-called stochastic types, on which round-off errors can be estimated. Furthermore CADNA contains the definition of arithmetic and relational operators which are overloaded for stochastic variables and the definition of mathematical functions which can be used with stochastic arguments. On 64-bit processors, depending on the rounding mode chosen, the mathematical library associated with the GNU Fortran compiler may provide incorrect results or generate severe bugs. Therefore the CADNA library has been improved to enable the numerical validation of programs on 64-bit processors. New version program summaryProgram title: CADNA Catalogue identifier: AEAT_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEAT_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 28 488 No. of bytes in distributed program, including test data, etc.: 463 778 Distribution format: tar.gz Programming language: Fortran NOTE: A C++ version of this program is available in the Library as AEGQ_v1_0 Computer: PC running LINUX with an i686 or an ia64 processor, UNIX workstations including SUN, IBM Operating system: LINUX, UNIX Classification: 6.5 Catalogue identifier of previous version: AEAT_v1_0 Journal reference of previous version: Comput. Phys. Commun. 178 (2008) 933 Does the new version supersede the previous version?: Yes Nature of problem: A simulation program which uses floating-point arithmetic generates round-off errors, due to the rounding performed at each assignment and at each arithmetic operation. Round-off error propagation may invalidate the result of a program. The CADNA library enables one to estimate round-off error propagation in any simulation program and to detect all numerical instabilities that may occur at run time. Solution method: The CADNA library [1-3] implements Discrete Stochastic Arithmetic [4,5] which is based on a probabilistic model of round-off errors. The program is run several times with a random rounding mode generating different results each time. From this set of results, CADNA estimates the number of exact significant digits in the result that would have been computed with standard floating-point arithmetic. Reasons for new version: On 64-bit processors, the mathematical library associated with the GNU Fortran compiler may provide incorrect results or generate severe bugs with rounding towards -∞ and +∞, which the random rounding mode is based on. Therefore a particular definition of mathematical functions for stochastic arguments has been included in the CADNA library to enable its use with the GNU Fortran compiler on 64-bit processors. Summary of revisions: If CADNA is used on a 64-bit processor with the GNU Fortran compiler, mathematical functions are computed with rounding to the nearest, otherwise they are computed with the random rounding mode. It must be pointed out that the knowledge of the accuracy of the stochastic argument of a mathematical function is never lost. Restrictions: CADNA requires a Fortran 90 (or newer) compiler. In the program to be linked with the CADNA library, round-off errors on complex variables cannot be estimated. Furthermore array functions such as product or sum must not be used. Only the arithmetic operators and the abs, min, max and sqrt functions can be used for arrays. Additional comments: In the library archive, users are advised to read the INSTALL file first. The doc directory contains a user guide named ug.cadna.pdf which shows how to control the numerical accuracy of a program using CADNA, provides installation instructions and describes test runs. The source code, which is located in the src directory, consists of one assembly language file (cadna_rounding.s) and eighteen Fortran language files. cadna_rounding.s is a symbolic link to the assembly file corresponding to the processor and the Fortran compiler used. This assembly file contains routines which are frequently called in the CADNA Fortran files to change the rounding mode. The Fortran language files contain the definition of the stochastic types on which the control of accuracy can be performed, CADNA specific functions (for instance to enable or disable the detection of numerical instabilities), the definition of arithmetic and relational operators which are overloaded for stochastic variables and the definition of mathematical functions which can be used with stochastic arguments. The examples directory contains seven test runs which illustrate the use of the CADNA library and the benefits of Discrete Stochastic Arithmetic. Running time: The version of a code which uses CADNA runs at least three times slower than its floating-point version. This cost depends on the computer architecture and can be higher if the detection of numerical instabilities is enabled. In this case, the cost may be related to the number of instabilities detected.
A Digital Motion Control System for Large Telescopes
NASA Astrophysics Data System (ADS)
Hunter, T. R.; Wilson, R. W.; Kimberk, R.; Leiker, P. S.
2001-05-01
We have designed and programmed a digital motion control system for large telescopes, in particular, the 6-meter antennas of the Submillimeter Array on Mauna Kea. The system consists of a single robust, high-reliability microcontroller board which implements a two-axis velocity servo while monitoring and responding to critical safety parameters. Excellent tracking performance has been achieved with this system (0.3 arcsecond RMS at sidereal rate). The 24x24 centimeter four-layer printed circuit board contains a multitude of hardware devices: 40 digital inputs (for limit switches and fault indicators), 32 digital outputs (to enable/disable motor amplifiers and brakes), a quad 22-bit ADC (to read the motor tachometers), four 16-bit DACs (that provide torque signals to the motor amplifiers), a 32-LED status panel, a serial port to the LynxOS PowerPC antenna computer (RS422/460kbps), a serial port to the Palm Vx handpaddle (RS232/115kbps), and serial links to the low-resolution absolute encoders on the azimuth and elevation axes. Each section of the board employs independent ground planes and power supplies, with optical isolation on all I/O channels. The processor is an Intel 80C196KC 16-bit microcontroller running at 20MHz on an 8-bit bus. This processor executes an interrupt-driven, scheduler-based software system written in C and assembled into an EPROM with user-accessible variables stored in NVSRAM. Under normal operation, velocity update requests arrive at 100Hz from the position-loop servo process running independently on the antenna computer. A variety of telescope safety checks are performed at 279Hz including routine servicing of a 6 millisecond watchdog timer. Additional ADCs onboard the microcontroller monitor the winding temperature and current in the brushless three-phase drive motors. The PID servo gains can be dynamically changed in software. Calibration factors and software filters can be applied to the tachometer readings prior to the application of the servo gains in the torque computations. The Palm pilot handpaddle displays the complete status of the telescope and allows full local control of the drives in an intuitive, touchscreen user interface which is especially useful during reconfigurations of the antenna array.
ms2: A molecular simulation tool for thermodynamic properties
NASA Astrophysics Data System (ADS)
Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran
2011-11-01
This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
Holo-Chidi video concentrator card
NASA Astrophysics Data System (ADS)
Nwodoh, Thomas A.; Prabhakar, Aditya; Benton, Stephen A.
2001-12-01
The Holo-Chidi Video Concentrator Card is a frame buffer for the Holo-Chidi holographic video processing system. Holo- Chidi is designed at the MIT Media Laboratory for real-time computation of computer generated holograms and the subsequent display of the holograms at video frame rates. The Holo-Chidi system is made of two sets of cards - the set of Processor cards and the set of Video Concentrator Cards (VCCs). The Processor cards are used for hologram computation, data archival/retrieval from a host system, and for higher-level control of the VCCs. The VCC formats computed holographic data from multiple hologram computing Processor cards, converting the digital data to analog form to feed the acousto-optic-modulators of the Media lab's Mark-II holographic display system. The Video Concentrator card is made of: a High-Speed I/O (HSIO) interface whence data is transferred from the hologram computing Processor cards, a set of FIFOs and video RAM used as buffer for data for the hololines being displayed, a one-chip integrated microprocessor and peripheral combination that handles communication with other VCCs and furnishes the card with a USB port, a co-processor which controls display data formatting, and D-to-A converters that convert digital fringes to analog form. The co-processor is implemented with an SRAM-based FPGA with over 500,000 gates and controls all the signals needed to format the data from the multiple Processor cards into the format required by Mark-II. A VCC has three HSIO ports through which up to 500 Megabytes of computed holographic data can flow from the Processor Cards to the VCC per second. A Holo-Chidi system with three VCCs has enough frame buffering capacity to hold up to thirty two 36Megabyte hologram frames at a time. Pre-computed holograms may also be loaded into the VCC from a host computer through the low- speed USB port. Both the microprocessor and the co- processor in the VCC can access the main system memory used to store control programs and data for the VCC. The Card also generates the control signals used by the scanning mirrors of Mark-II. In this paper we discuss the design of the VCC and its implementation in the Holo-Chidi system.
A Cost Effective System Design Approach for Critical Space Systems
NASA Technical Reports Server (NTRS)
Abbott, Larry Wayne; Cox, Gary; Nguyen, Hai
2000-01-01
NASA-JSC required an avionics platform capable of serving a wide range of applications in a cost-effective manner. In part, making the avionics platform cost effective means adhering to open standards and supporting the integration of COTS products with custom products. Inherently, operation in space requires low power, mass, and volume while retaining high performance, reconfigurability, scalability, and upgradability. The Universal Mini-Controller project is based on a modified PC/104-Plus architecture while maintaining full compatibility with standard COTS PC/104 products. The architecture consists of a library of building block modules, which can be mixed and matched to meet a specific application. A set of NASA developed core building blocks, processor card, analog input/output card, and a Mil-Std-1553 card, have been constructed to meet critical functions and unique interfaces. The design for the processor card is based on the PowerPC architecture. This architecture provides an excellent balance between power consumption and performance, and has an upgrade path to the forthcoming radiation hardened PowerPC processor. The processor card, which makes extensive use of surface mount technology, has a 166 MHz PowerPC 603e processor, 32 Mbytes of error detected and corrected RAM, 8 Mbytes of Flash, and I Mbytes of EPROM, on a single PC/104-Plus card. Similar densities have been achieved with the quad channel Mil-Std-1553 card and the analog input/output cards. The power management built into the processor and its peripheral chip allows the power and performance of the system to be adjusted to meet the requirements of the application, allowing another dimension to the flexibility of the Universal Mini-Controller. Unique mechanical packaging allows the Universal Mini-Controller to accommodate standard COTS and custom oversized PC/104-Plus cards. This mechanical packaging also provides thermal management via conductive cooling of COTS boards, which are typically designed for convection cooling methods.
Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels
NASA Astrophysics Data System (ADS)
Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.
1982-04-01
This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
NASA Technical Reports Server (NTRS)
Mejzak, R. S.
1980-01-01
The distributed processing concept is defined in terms of control primitives, variables, and structures and their use in performing a decomposed discrete Fourier transform (DET) application function. The design assumes interprocessor communications to be anonymous. In this scheme, all processors can access an entire common database by employing control primitives. Access to selected areas within the common database is random, enforced by a hardware lock, and determined by task and subtask pointers. This enables the number of processors to be varied in the configuration without any modifications to the control structure. Decompositional elements of the DFT application function in terms of tasks and subtasks are also described. The experimental hardware configuration consists of IMSAI 8080 chassis which are independent, 8 bit microcomputer units. These chassis are linked together to form a multiple processing system by means of a shared memory facility. This facility consists of hardware which provides a bus structure to enable up to six microcomputers to be interconnected. It provides polling and arbitration logic so that only one processor has access to shared memory at any one time.
Low-voltage analog front-end processor design for ISFET-based sensor and H+ sensing applications
NASA Astrophysics Data System (ADS)
Chung, Wen-Yaw; Yang, Chung-Huang; Peng, Kang-Chu; Yeh, M. H.
2003-04-01
This paper presents a modular-based low-voltage analog-front-end processor design in a 0.5mm double-poly double-metal CMOS technology for Ion Sensitive Field Effect Transistor (ISFET)-based sensor and H+ sensing applications. To meet the potentiometric response of the ISFET that is proportional to various H+ concentrations, the constant-voltage and constant current (CVCS) testing configuration has been used. Low-voltage design skills such as bulk-driven input pair, folded-cascode amplifier, bootstrap switch control circuits have been designed and integrated for 1.5V supply and nearly rail-to-rail analog to digital signal processing. Core modules consist of an 8-bit two-step analog-digital converter and bulk-driven pre-amplifiers have been developed in this research. The experimental results show that the proposed circuitry has an acceptable linearity to 0.1 pH-H+ sensing conversions with the buffer solution in the range of pH2 to pH12. The processor has a potential usage in battery-operated and portable healthcare devices and environmental monitoring applications.
G-cueing microcontroller (a microprocessor application in simulators)
NASA Technical Reports Server (NTRS)
Horattas, C. G.
1980-01-01
A g cueing microcontroller is described which consists of a tandem pair of microprocessors, dedicated to the task of simulating pilot sensed cues caused by gravity effects. This task includes execution of a g cueing model which drives actuators that alter the configuration of the pilot's seat. The g cueing microcontroller receives acceleration commands from the aerodynamics model in the main computer and creates the stimuli that produce physical acceleration effects of the aircraft seat on the pilots anatomy. One of the two microprocessors is a fixed instruction processor that performs all control and interface functions. The other, a specially designed bipolar bit slice microprocessor, is a microprogrammable processor dedicated to all arithmetic operations. The two processors communicate with each other by a shared memory. The g cueing microcontroller contains its own dedicated I/O conversion modules for interface with the seat actuators and controls, and a DMA controller for interfacing with the simulation computer. Any application which can be microcoded within the available memory, the available real time and the available I/O channels, could be implemented in the same controller.
NASA Astrophysics Data System (ADS)
Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.
2017-12-01
As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
Readout and DAQ for Pixel Detectors
NASA Astrophysics Data System (ADS)
Platkevic, Michal
2010-01-01
Data readout and acquisition control of pixel detectors demand the transfer of significantly a large amounts of bits between the detector and the computer. For this purpose dedicated interfaces are used which are designed with focus on features like speed, small dimensions or flexibility of use such as digital signal processors, field-programmable gate arrays (FPGA) and USB communication ports. This work summarizes the readout and DAQ system built for state-of-the-art pixel detectors of the Medipix family.
NASA Technical Reports Server (NTRS)
Kelly, G. L.; Berthold, G.; Abbott, L.
1982-01-01
A 5 MHZ single-board microprocessor system which incorporates an 8086 CPU and an 8087 Numeric Data Processor is used to implement the control laws for the NASA Drones for Aerodynamic and Structural Testing, Aeroelastic Research Wing II. The control laws program was executed in 7.02 msec, with initialization consuming 2.65 msec and the control law loop 4.38 msec. The software emulator execution times for these two tasks were 36.67 and 61.18, respectively, for a total of 97.68 msec. The space, weight and cost reductions achieved in the present, aircraft control application of this combination of a 16-bit microprocessor with an 80-bit floating point coprocessor may be obtainable in other real time control applications.
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
A novel parallel architecture for local histogram equalization
NASA Astrophysics Data System (ADS)
Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan
2005-07-01
Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.
Packaging of a large capacity magnetic bubble domain spacecraft recorder
NASA Technical Reports Server (NTRS)
Becker, F. J.; Stermer, R. L.
1977-01-01
A Solid State Spacecraft Data Recorder (SSDR), based on bubble domain technology, having a storage capacity of 10 to the 8th power bits, was designed and is being tested. The recorder consists of two memory modules each having 32 cells, each cell containing sixteen 100 kilobit serial bubble memory chips. The memory modules are interconnected to a Drive and Control Unit (DCU) module containing four microprocessors, 500 integrated circuits, a RAM core memory and two PROM's. The two memory modules and DCU are housed in individual machined aluminum frames, are stacked in brick fashion and through bolted to a base plate assembly which also houses the power supply.
Computer hardware for radiologists: Part I
Indrajit, IK; Alam, A
2010-01-01
Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM), Picture Archiving and Communication System (PACS), Radiology information system (RIS) technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU), the chipset, the random access memory (RAM), the memory modules, bus, storage drives, and ports. The personnel computer (PC) has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs). The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called “buses”. The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute “programs”. A Pentium® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM) is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration. PMID:21042437
Optimization of image processing algorithms on mobile platforms
NASA Astrophysics Data System (ADS)
Poudel, Pramod; Shirvaikar, Mukul
2011-03-01
This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.
Realtime multiprocessor for mobile ad hoc networks
NASA Astrophysics Data System (ADS)
Jungeblut, T.; Grünewald, M.; Porrmann, M.; Rückert, U.
2008-05-01
This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC) for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC). The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.
NASA Technical Reports Server (NTRS)
Wallace, D. A.
1980-01-01
A thermoelectrically temperature controlled quartz crystal microbalance (QCM) system was developed for the measurement of ion thrustor generated mercury contamination on spacecraft. Meaningful flux rate measurements dictated an accurately held sensing crystal temperature despite spacecraft surface temperature variations from -35 C to +60 C over the flight temperature range. An electronic control unit was developed with magentic amplifier transformer secondary power supply, thermal control electronics, crystal temperature analog conditioning and a multiplexed 16 bit frequency encoder.
Investigating the Naval Logistics Role in Humanitarian Assistance Activities
2015-03-01
transportation means. E. BASE CASE RESULTS The computations were executed on a MacBook Pro , 3 GHz Intel Core i7-4578U processor with 8 GB. The...MacBook Pro was partitioned to also contain a Windows 7, 64-bit operating system. The computations were run in the Windows 7 operating system using the...it impacts the types of metamodels that can be developed as a result of data farming (Lucas et al., 2015). Using a metamodel, one can closely
Universal sensor interface module (USIM)
NASA Astrophysics Data System (ADS)
King, Don; Torres, A.; Wynn, John
1999-01-01
A universal sensor interface model (USIM) is being developed by the Raytheon-TI Systems Company for use with fields of unattended distributed sensors. In its production configuration, the USIM will be a multichip module consisting of a set of common modules. The common module USIM set consists of (1) a sensor adapter interface (SAI) module, (2) digital signal processor (DSP) and associated memory module, and (3) a RF transceiver model. The multispectral sensor interface is designed around a low-power A/D converted, whose input/output interface consists of: -8 buffered, sampled inputs from various devices including environmental, acoustic seismic and magnetic sensors. The eight sensor inputs are each high-impedance, low- capacitance, differential amplifiers. The inputs are ideally suited for interface with discrete or MEMS sensors, since the differential input will allow direct connection with high-impedance bridge sensors and capacitance voltage sources. Each amplifier is connected to a 22-bit (Delta) (Sigma) A/D converter to enable simultaneous samples. The low power (Delta) (Sigma) converter provides 22-bit resolution at sample frequencies up to 142 hertz (used for magnetic sensors) and 16-bit resolution at frequencies up to 1168 hertz (used for acoustic and seismic sensors). The video interface module is based around the TMS320C5410 DSP. It can provide sensor array addressing, video data input, data calibration and correction. The processor module is based upon a MPC555. It will be used for mode control, synchronization of complex sensors, sensor signal processing, array processing, target classification and tracking. Many functions of the A/D, DSP and transceiver can be powered down by using variable clock speeds under software command or chip power switches. They can be returned to intermediate or full operation by DSP command. Power management may be based on the USIM's internal timer, command from the USIM transceiver, or by sleep mode processing management. The low power detection mode is implemented by monitoring any of the sensor analog outputs at lower sample rates for detection over a software controllable threshold.
The use of imprecise processing to improve accuracy in weather & climate prediction
NASA Astrophysics Data System (ADS)
Düben, Peter D.; McNamara, Hugh; Palmer, T. N.
2014-08-01
The use of stochastic processing hardware and low precision arithmetic in atmospheric models is investigated. Stochastic processors allow hardware-induced faults in calculations, sacrificing bit-reproducibility and precision in exchange for improvements in performance and potentially accuracy of forecasts, due to a reduction in power consumption that could allow higher resolution. A similar trade-off is achieved using low precision arithmetic, with improvements in computation and communication speed and savings in storage and memory requirements. As high-performance computing becomes more massively parallel and power intensive, these two approaches may be important stepping stones in the pursuit of global cloud-resolving atmospheric modelling. The impact of both hardware induced faults and low precision arithmetic is tested using the Lorenz '96 model and the dynamical core of a global atmosphere model. In the Lorenz '96 model there is a natural scale separation; the spectral discretisation used in the dynamical core also allows large and small scale dynamics to be treated separately within the code. Such scale separation allows the impact of lower-accuracy arithmetic to be restricted to components close to the truncation scales and hence close to the necessarily inexact parametrised representations of unresolved processes. By contrast, the larger scales are calculated using high precision deterministic arithmetic. Hardware faults from stochastic processors are emulated using a bit-flip model with different fault rates. Our simulations show that both approaches to inexact calculations do not substantially affect the large scale behaviour, provided they are restricted to act only on smaller scales. By contrast, results from the Lorenz '96 simulations are superior when small scales are calculated on an emulated stochastic processor than when those small scales are parametrised. This suggests that inexact calculations at the small scale could reduce computation and power costs without adversely affecting the quality of the simulations. This would allow higher resolution models to be run at the same computational cost.
Implementation of High Speed Distributed Data Acquisition System
NASA Astrophysics Data System (ADS)
Raju, Anju P.; Sekhar, Ambika
2012-09-01
This paper introduces a high speed distributed data acquisition system based on a field programmable gate array (FPGA). The aim is to develop a "distributed" data acquisition interface. The development of instruments such as personal computers and engineering workstations based on "standard" platforms is the motivation behind this effort. Using standard platforms as the controlling unit allows independence in hardware from a particular vendor and hardware platform. The distributed approach also has advantages from a functional point of view: acquisition resources become available to multiple instruments; the acquisition front-end can be physically remote from the rest of the instrument. High speed data acquisition system transmits data faster to a remote computer system through Ethernet interface. The data is acquired through 16 analog input channels. The input data commands are multiplexed and digitized and then the data is stored in 1K buffer for each input channel. The main control unit in this design is the 16 bit processor implemented in the FPGA. This 16 bit processor is used to set up and initialize the data source and the Ethernet controller, as well as control the flow of data from the memory element to the NIC. Using this processor we can initialize and control the different configuration registers in the Ethernet controller in a easy manner. Then these data packets are sending to the remote PC through the Ethernet interface. The main advantages of the using FPGA as standard platform are its flexibility, low power consumption, short design duration, fast time to market, programmability and high density. The main advantages of using Ethernet controller AX88796 over others are its non PCI interface, the presence of embedded SRAM where transmit and reception buffers are located and high-performance SRAM-like interface. The paper introduces the implementation of the distributed data acquisition using FPGA by VHDL. The main advantages of this system are high accuracy, high speed, real time monitoring.
Digital camera with apparatus for authentication of images produced from an image file
NASA Technical Reports Server (NTRS)
Friedman, Gary L. (Inventor)
1993-01-01
A digital camera equipped with a processor for authentication of images produced from an image file taken by the digital camera is provided. The digital camera processor has embedded therein a private key unique to it, and the camera housing has a public key that is so uniquely based upon the private key that digital data encrypted with the private key by the processor may be decrypted using the public key. The digital camera processor comprises means for calculating a hash of the image file using a predetermined algorithm, and second means for encrypting the image hash with the private key, thereby producing a digital signature. The image file and the digital signature are stored in suitable recording means so they will be available together. Apparatus for authenticating at any time the image file as being free of any alteration uses the public key for decrypting the digital signature, thereby deriving a secure image hash identical to the image hash produced by the digital camera and used to produce the digital signature. The apparatus calculates from the image file an image hash using the same algorithm as before. By comparing this last image hash with the secure image hash, authenticity of the image file is determined if they match, since even one bit change in the image hash will cause the image hash to be totally different from the secure hash.
Sequence information signal processor for local and global string comparisons
Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.
1997-01-01
A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.
Josephson 4 K-bit cache memory design for a prototype signal processor. I - General overview
NASA Astrophysics Data System (ADS)
Henkels, W. H.; Geppert, L. M.; Kadlec, J.; Epperlein, P. W.; Beha, H.
1985-09-01
In the early stages of thg Josephson computer project conducted at an American computer company, it was recognized that a very fast cache memory was needed to complement Josephson logic. A subnanosecond access time memory was implemented experimentally on the basis of a 2.5-micron Pb-alloy technology. It was then decided to switch over to a Nb-base-electrode technology with the objective to alleviate problems with the long-term reliability and aging of Pb-based junctions. The present paper provides a general overview of the status of a 4 x 1 K-bit Josephson cache design employing a 2.5-micron Nb-edge-junction technology. Attention is given to the fabrication process and its implications, aspects of circuit design methodology, an overview of system environment and chip components, design changes and status, and various difficulties and uncertainties.
Finite element computation on nearest neighbor connected machines
NASA Technical Reports Server (NTRS)
Mcaulay, A. D.
1984-01-01
Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
A cost-effective methodology for the design of massively-parallel VLSI functional units
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Sriram, G.; Desouza, J.
1993-01-01
In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
Microbeam mapping of single event latchups and single event upsets in CMOS SRAMs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barak, J.; Adler, E.; Fischer, B.E.
1998-06-01
The first simultaneous microbeam mapping of single event upset (SEU) and latchup (SEL) in the CMOS RAM HM65162 is presented. The authors found that the shapes of the sensitive areas depend on V{sub DD}, on the ions being used and on the site on the chip being hit by the ion. In particular, they found SEL sensitive sites close to the main power supply lines between the memory-bit-arrays by detecting the accompanying current surge. All these SELs were also accompanied by bit-flips elsewhere in the memory (which they call indirect SEUs in contrast to the well known SEUs induced inmore » the hit memory cell only). When identical SEL sensitive sites were hit farther away from the supply lines only indirect SEL sensitive sites could be detected. They interpret these events as latent latchups in contrast to the classical ones detected by their induced current surge. These latent SELs were probably decoupled from the main supply lines by the high resistivity of the local supply lines.« less
NASA Astrophysics Data System (ADS)
Fay, James A.; Sonwalkar, Nishikant
1996-05-01
This CD-ROM is designed to accompany James Fay's Introduction to Fluid Mechanics. An enhanced hypermedia version of the textbook, it offers a number of ways to explore the fluid mechanics domain. These include a complete hypertext version of the original book, physical-experiment video clips, excerpts from external references, audio annotations, colored graphics, review questions, and progressive hints for solving problems. Throughout, the authors provide expert guidance in navigating the typed links so that students do not get lost in the learning process. System requirements: Macintosh with 68030 or greater processor and with at least 16 Mb of RAM. Operating System 6.0.4 or later for 680x0 processor and System 7.1.2 or later for Power-PC. CD-ROM drive with 256- color capability. Preferred display 14 inches or above (SuperVGA with 1 megabyte of VRAM). Additional system font software: Computer Modern postscript fonts (CM/PS Screen Fonts, CMBSY10, and CMTT10) and Adobe Type Manager (ATM 3.0 or later). James A. Fay is Professor Emeritus and Senior Lecturer in the Department of Mechanical Engineering at MIT.
Optical Interconnections for VLSI Computational Systems Using Computer-Generated Holography.
NASA Astrophysics Data System (ADS)
Feldman, Michael Robert
Optical interconnects for VLSI computational systems using computer generated holograms are evaluated in theory and experiment. It is shown that by replacing particular electronic connections with free-space optical communication paths, connection of devices on a single chip or wafer and between chips or modules can be improved. Optical and electrical interconnects are compared in terms of power dissipation, communication bandwidth, and connection density. Conditions are determined for which optical interconnects are advantageous. Based on this analysis, it is shown that by applying computer generated holographic optical interconnects to wafer scale fine grain parallel processing systems, dramatic increases in system performance can be expected. Some new interconnection networks, designed to take full advantage of optical interconnect technology, have been developed. Experimental Computer Generated Holograms (CGH's) have been designed, fabricated and subsequently tested in prototype optical interconnected computational systems. Several new CGH encoding methods have been developed to provide efficient high performance CGH's. One CGH was used to decrease the access time of a 1 kilobit CMOS RAM chip. Another was produced to implement the inter-processor communication paths in a shared memory SIMD parallel processor array.
High-Precision Timing of Several Millisecond Pulsars
NASA Astrophysics Data System (ADS)
Ferdman, R. D.; Stairs, I. H.; Backer, D. C.; Ramachandran, R.; Demorest, P.; Nice, D. J.; Lyne, A. G.; Kramer, M.; Lorimer, D.; McLaughlin, M.; Manchester, D.; Camilo, F.; D'Amico, N.; Possenti, A.; Burgay, M.; Joshi, B. C.; Freire, P. C.
2004-12-01
The highest precision pulsar timing is achieved by reproducing as accurately as possible the pulse profile as emitted by the pulsar, in high signal-to-noise observations. The best profile reconstruction can be accomplished with several-bit voltage sampling and coherent removal of the dispersion suffered by pulsar signals as they traverse the interstellar medium. The Arecibo Signal Processor (ASP) and its counterpart the Green Bank Astronomical Signal Processor (GASP) are flexible, state-of-the-art wide-bandwidth observing systems, built primarily for high-precision long-term timing of millisecond and binary pulsars. ASP and GASP are in use at the 300-m Arecibo telescope in Puerto Rico and the 100-m Green Bank Telescope in Green Bank, West Virginia, respectively, taking advantage of the enormous sensitivities of these telescopes. These instruments result in high-precision science through 4 and 8-bit sampling and perform coherent dedispersion on the incoming data stream in real or near-real time. This is done using a network of personal computers, over an observing bandwidth of 64 to 128 MHz, in each of two polarizations. We present preliminary results of timing and polarimetric observations with ASP/GASP for several pulsars, including the recently-discovered relativistic double-pulsar binary J0737-3039. These data are compared to simultaneous observations with other pulsar instruments, such as the new "spigot card" spectrometer on the GBT and the Princeton Mark IV instrument at Arecibo, the precursor timing system to ASP. We also briefly discuss several upcoming observations with ASP/GASP.
VASP-4096: a very high performance programmable device for digital media processing applications
NASA Astrophysics Data System (ADS)
Krikelis, Argy
2001-03-01
Over the past few years, technology drivers for microprocessors have changed significantly. Media data delivery and processing--such as telecommunications, networking, video processing, speech recognition and 3D graphics--is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper presents the architecture of the VASP-4096 processor. VASP-4096 provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the VASP-4096 is the integration of thousands of processing units in a single chip that are capable of support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, VASP-4096 integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in VASP-4096 implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHz (double-data rate), and a 64- bit 66 MHz PCI interface. VASP-4096, compared with other processors architectures that support media processing, offers true performance scalability, support for deterministic and non-deterministic data processing on a single device, and software programmability that can be re- used in future chip generations.
Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS
NASA Astrophysics Data System (ADS)
Lachowicz, Stefan W.; Rassau, Alexander; Lee, Seung-Minh; Eshraghian, Kamran; Lee, Mike M.
2003-04-01
Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.
NASA Technical Reports Server (NTRS)
Dohi, Tomohiro; Nitta, Kazumasa; Ueda, Takashi
1993-01-01
This paper proposes a new type of coherent demodulator, the unique-word (UW)-reverse-modulation type demodulator, for burst signal controlled by voice operated transmitter (VOX) in mobile satellite communication channels. The demodulator has three individual circuits: a pre-detection signal combiner, a pre-detection UW detector, and a UW-reverse-modulation type demodulator. The pre-detection signal combiner combines signal sequences received by two antennas and improves bit energy-to-noise power density ratio (E(sub b)/N(sub 0)) 2.5 dB to yield 10(exp -3) average bit error rate (BER) when carrier power-to-multipath power ratio (CMR) is 15 dB. The pre-detection UW detector improves UW detection probability when the frequency offset is large. The UW-reverse-modulation type demodulator realizes a maximum pull-in frequency of 3.9 kHz, the pull-in time is 2.4 seconds and frequency error is less than 20 Hz. The performances of this demodulator are confirmed through computer simulations and its effect is clarified in real-time experiments at a bit rate of 16.8 kbps using a digital signal processor (DSP).
Design and test of a regenerative satellite transmultiplexer
NASA Astrophysics Data System (ADS)
Hung, Kenny King-Ming
1993-05-01
In a multiple access scheme for regenerative satellite communications, the bulk frequency division multiple access (FDMA) uplink signal is demodulated on board the satellite and then remodulated for time division multiplexing (TDM) downlink transmission. Conversion from frequency to time division multiplex format requires that the uplink signal be frequency demultiplexed and each individual carrier be subsequently demodulated. For thin-route application which consists of a large number of channels with fixed data rate, multicarrier demodulation can be accomplished efficiently by a digital transmultiplexer (TMUX) using a fast Fourier transform processor followed by a bank of per-channel processors. A time domain description of the TMUX algorithm is derived which elucidates how the TMUX functions. The per-channel processor performs timing and carrier recovery for optimum and coherent data detection. Timing recovery is necessarily achieved asynchronously by a filter coefficient interpolation. Carrier recovery is performed using an all-digital phase-locked loop. The combination of both timing and carrier loops is investigated for a multi-user system. The performance of the overall system is assessed over a multi-user, additive white Gaussian noise channel for a bit energy to noise power spectral density ratio down to zero dB.
Real-Time Data Processing in the muon system of the D0 detector.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neeti Parashar et al.
2001-07-03
This paper presents a real-time application of the 16-bit fixed point Digital Signal Processors (DSPs), in the Muon System of the D0 detector located at the Fermilab Tevatron, presently the world's highest-energy hadron collider. As part of the Upgrade for a run beginning in the year 2000, the system is required to process data at an input event rate of 10 KHz without incurring significant deadtime in readout. The ADSP21csp01 processor has high I/O bandwidth, single cycle instruction execution and fast task switching support to provide efficient multisignal processing. The processor's internal memory consists of 4K words of Program Memorymore » and 4K words of Data Memory. In addition there is an external memory of 32K words for general event buffering and 16K words of Dual port Memory for input data queuing. This DSP fulfills the requirement of the Muon subdetector systems for data readout. All error handling, buffering, formatting and transferring of the data to the various trigger levels of the data acquisition system is done in software. The algorithms developed for the system complete these tasks in about 20 {micro}s per event.« less
NASA Astrophysics Data System (ADS)
Hill, C.
2008-12-01
Low cost graphic cards today use many, relatively simple, compute cores to deliver support for memory bandwidth of more than 100GB/s and theoretical floating point performance of more than 500 GFlop/s. Right now this performance is, however, only accessible to highly parallel algorithm implementations that, (i) can use a hundred or more, 32-bit floating point, concurrently executing cores, (ii) can work with graphics memory that resides on the graphics card side of the graphics bus and (iii) can be partially expressed in a language that can be compiled by a graphics programming tool. In this talk we describe our experiences implementing a complete, but relatively simple, time dependent shallow-water equations simulation targeting a cluster of 30 computers each hosting one graphics card. The implementation takes into account the considerations (i), (ii) and (iii) listed previously. We code our algorithm as a series of numerical kernels. Each kernel is designed to be executed by multiple threads of a single process. Kernels are passed memory blocks to compute over which can be persistent blocks of memory on a graphics card. Each kernel is individually implemented using the NVidia CUDA language but driven from a higher level supervisory code that is almost identical to a standard model driver. The supervisory code controls the overall simulation timestepping, but is written to minimize data transfer between main memory and graphics memory (a massive performance bottle-neck on current systems). Using the recipe outlined we can boost the performance of our cluster by nearly an order of magnitude, relative to the same algorithm executing only on the cluster CPU's. Achieving this performance boost requires that many threads are available to each graphics processor for execution within each numerical kernel and that the simulations working set of data can fit into the graphics card memory. As we describe, this puts interesting upper and lower bounds on the problem sizes for which this technology is currently most useful. However, many interesting problems fit within this envelope. Looking forward, we extrapolate our experience to estimate full-scale ocean model performance and applicability. Finally we describe preliminary hybrid mixed 32-bit and 64-bit experiments with graphics cards that support 64-bit arithmetic, albeit at a lower performance.
Algorithms and Application of Sparse Matrix Assembly and Equation Solvers for Aeroacoustics
NASA Technical Reports Server (NTRS)
Watson, W. R.; Nguyen, D. T.; Reddy, C. J.; Vatsa, V. N.; Tang, W. H.
2001-01-01
An algorithm for symmetric sparse equation solutions on an unstructured grid is described. Efficient, sequential sparse algorithms for degree-of-freedom reordering, supernodes, symbolic/numerical factorization, and forward backward solution phases are reviewed. Three sparse algorithms for the generation and assembly of symmetric systems of matrix equations are presented. The accuracy and numerical performance of the sequential version of the sparse algorithms are evaluated over the frequency range of interest in a three-dimensional aeroacoustics application. Results show that the solver solutions are accurate using a discretization of 12 points per wavelength. Results also show that the first assembly algorithm is impractical for high-frequency noise calculations. The second and third assembly algorithms have nearly equal performance at low values of source frequencies, but at higher values of source frequencies the third algorithm saves CPU time and RAM. The CPU time and the RAM required by the second and third assembly algorithms are two orders of magnitude smaller than that required by the sparse equation solver. A sequential version of these sparse algorithms can, therefore, be conveniently incorporated into a substructuring for domain decomposition formulation to achieve parallel computation, where different substructures are handles by different parallel processors.
NASA Technical Reports Server (NTRS)
Hofman, L. B.; Erickson, W. K.; Donovan, W. E.
1984-01-01
Image Display and Analysis Systems (MIDAS) developed at NASA/Ames for the analysis of Landsat MSS images is described. The MIDAS computer power and memory, graphics, resource-sharing, expansion and upgrade, environment and maintenance, and software/user-interface requirements are outlined; the implementation hardware (including 32-bit microprocessor, 512K error-correcting RAM, 70 or 140-Mbyte formatted disk drive, 512 x 512 x 24 color frame buffer, and local-area-network transceiver) and applications software (ELAS, CIE, and P-EDITOR) are characterized; and implementation problems, performance data, and costs are examined. Planned improvements in MIDAS hardware and design goals and areas of exploration for MIDAS software are discussed.
Parallel implementation of Hartree-Fock and density functional theory analytical second derivatives
NASA Astrophysics Data System (ADS)
Baker, Jon; Wolinski, Krzysztof; Malagoli, Massimo; Pulay, Peter
2004-01-01
We present an efficient, parallel implementation for the calculation of Hartree-Fock and density functional theory analytical Hessian (force constant, nuclear second derivative) matrices. These are important for the determination of harmonic vibrational frequencies, and to classify stationary points on potential energy surfaces. Our program is designed for modest parallelism (4-16 CPUs) as exemplified by our standard eight-processor QuantumCube™. We can routinely handle systems with up to 100+ atoms and 1000+ basis functions using under 0.5 GB of RAM memory per CPU. Timings are presented for several systems, ranging in size from aspirin (C9H8O4) to nickel octaethylporphyrin (C36H44N4Ni).
Accelerating functional verification of an integrated circuit
Deindl, Michael; Ruedinger, Jeffrey Joseph; Zoellin, Christian G.
2015-10-27
Illustrative embodiments include a method, system, and computer program product for accelerating functional verification in simulation testing of an integrated circuit (IC). Using a processor and a memory, a serial operation is replaced with a direct register access operation, wherein the serial operation is configured to perform bit shifting operation using a register in a simulation of the IC. The serial operation is blocked from manipulating the register in the simulation of the IC. Using the register in the simulation of the IC, the direct register access operation is performed in place of the serial operation.
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
A multicomputer simulation of the Galileo spacecraft command and data subsystem
NASA Technical Reports Server (NTRS)
Zipse, John E.; Yeung, Raymond Y.; Zimmerman, Barbara A.; Morillo, Ronald; Olster, Daniel B.; Flower, Jon W.; Mizuo, Thomas
1991-01-01
A detailed simulation of the command and data subsystem of the Galileo spacecraft on a distributed memory multicomputer is described. The simulation is based on an ensemble of Inmos Transputers for simulating, to the bit level, the execution of instruction sequences for the six RCA 1802 microcomputers and the intricate bus traffic between them and other components of the spacecraft. Expressions were developed to estimate the performance of the simulator on a distributed system given the processor clock speed, memory access time, and communication characteristics.
Scalable and portable visualization of large atomistic datasets
NASA Astrophysics Data System (ADS)
Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya
2004-10-01
A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data minimization, and levels-of-detail for minimal rendering Restrictions on the complexity of the problem: None Typical running time: The program is interactive in its execution Unusual features of the program: None References: The conceptual foundation and subsequent implementation of the algorithms are found in [A. Sharma, A. Nakano, R.K. Kalia, P. Vashishta, S. Kodiyalam, P. Miller, W. Zhao, X.L. Liu, T.J. Campbell, A. Haas, Presence—Teleoperators and Virtual Environments 12 (1) (2003)].
NASA Astrophysics Data System (ADS)
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
CoNNeCT Baseband Processor Module Boot Code SoftWare (BCSW)
NASA Technical Reports Server (NTRS)
Yamamoto, Clifford K.; Orozco, David S.; Byrne, D. J.; Allen, Steven J.; Sahasrabudhe, Adit; Lang, Minh
2012-01-01
This software provides essential startup and initialization routines for the CoNNeCT baseband processor module (BPM) hardware upon power-up. A command and data handling (C&DH) interface is provided via 1553 and diagnostic serial interfaces to invoke operational, reconfiguration, and test commands within the code. The BCSW has features unique to the hardware it is responsible for managing. In this case, the CoNNeCT BPM is configured with an updated CPU (Atmel AT697 SPARC processor) and a unique set of memory and I/O peripherals that require customized software to operate. These features include configuration of new AT697 registers, interfacing to a new HouseKeeper with a flash controller interface, a new dual Xilinx configuration/scrub interface, and an updated 1553 remote terminal (RT) core. The BCSW is intended to provide a "safe" mode for the BPM when initially powered on or when an unexpected trap occurs, causing the processor to reset. The BCSW allows the 1553 bus controller in the spacecraft or payload controller to operate the BPM over 1553 to upload code; upload Xilinx bit files; perform rudimentary tests; read, write, and copy the non-volatile flash memory; and configure the Xilinx interface. Commands also exist over 1553 to cause the CPU to jump or call a specified address to begin execution of user-supplied code. This may be in the form of a real-time operating system, test routine, or specific application code to run on the BPM.
Implementation of 4-way Superscalar Hash MIPS Processor Using FPGA
NASA Astrophysics Data System (ADS)
Sahib Omran, Safaa; Fouad Jumma, Laith
2018-05-01
Due to the quick advancements in the personal communications systems and wireless communications, giving data security has turned into a more essential subject. This security idea turns into a more confounded subject when next-generation system requirements and constant calculation speed are considered in real-time. Hash functions are among the most essential cryptographic primitives and utilized as a part of the many fields of signature authentication and communication integrity. These functions are utilized to acquire a settled size unique fingerprint or hash value of an arbitrary length of message. In this paper, Secure Hash Algorithms (SHA) of types SHA-1, SHA-2 (SHA-224, SHA-256) and SHA-3 (BLAKE) are implemented on Field-Programmable Gate Array (FPGA) in a processor structure. The design is described and implemented using a hardware description language, namely VHSIC “Very High Speed Integrated Circuit” Hardware Description Language (VHDL). Since the logical operation of the hash types of (SHA-1, SHA-224, SHA-256 and SHA-3) are 32-bits, so a Superscalar Hash Microprocessor without Interlocked Pipelines (MIPS) processor are designed with only few instructions that were required in invoking the desired Hash algorithms, when the four types of hash algorithms executed sequentially using the designed processor, the total time required equal to approximately 342 us, with a throughput of 4.8 Mbps while the required to execute the same four hash algorithms using the designed four-way superscalar is reduced to 237 us with improved the throughput to 5.1 Mbps.
Moment distributions of clusters and molecules in the adiabatic rotor model
NASA Astrophysics Data System (ADS)
Ballentine, G. E.; Bertsch, G. F.; Onishi, N.; Yabana, K.
2008-01-01
We present a Fortran program to compute the distribution of dipole moments of free particles for use in analyzing molecular beams experiments that measure moments by deflection in an inhomogeneous field. The theory is the same for magnetic and electric dipole moments, and is based on a thermal ensemble of classical particles that are free to rotate and that have moment vectors aligned along a principal axis of rotation. The theory has two parameters, the ratio of the magnetic (or electric) dipole energy to the thermal energy, and the ratio of moments of inertia of the rotor. Program summaryProgram title:AdiabaticRotor Catalogue identifier:ADZO_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADZO_v1_0.html Program obtainable from:CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions:Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.:479 No. of bytes in distributed program, including test data, etc.:4853 Distribution format:tar.gz Programming language:Fortran 90 Computer:Pentium-IV, Macintosh Power PC G4 Operating system:Linux, Mac OS X RAM:600 Kbytes Word size:64 bits Classification:2.3 Nature of problem:The system considered is a thermal ensemble of rotors having a magnetic or electric moment aligned along one of the principal axes. The ensemble is placed in an external field which is turned on adiabatically. The problem is to find the distribution of moments in the presence of the external field. Solution method:There are three adiabatic invariants. The only nontrivial one is the action associated with the polar angle of the rotor axis with respect to external field. It is found by Newton's method. Running time:3 min on a 3 GHz Pentium IV processor.
Efficient system interrupt concept design at the microprogramming level
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fakharzadeh, M.M.
1989-01-01
Over the past decade the demand for high speed super microcomputers has been tremendously increased. To satisfy this demand many high speed 32-bit microcomputers have been designed. However, the currently available 32-bit systems do not provide an adequate solution to many highly demanding problems such as in multitasking, and in interrupt driven applications, which both require context switching. Systems for these purposes usually incorporate sophisticated software. In order to be efficient, a high end microprocessor based system must satisfy stringent software demands. Although these microprocessors use the latest technology in the fabrication design and run at a very high speed,more » they still suffer from insufficient hardware support for such applications. All too often, this lack also is the premier cause of execution inefficiency. In this dissertation a micro-programmable control unit and operation unit is considered in an advanced design. An automaton controller is designed for high speed micro-level interrupt handling. Different stack models are designed for the single task and multitasking environment. The stacks are used for storage of various components of the processor during the interrupt calls, procedure calls, and task switching. A universal (as an example seven port) register file is designed for high speed parameter passing, and intertask communication in the multitasking environment. In addition, the register file provides a direct path between ALU and the peripheral data which is important in real-time control applications. The overall system is a highly parallel architecture, with no pipeline and internal cache memory, which allows the designer to be able to predict the processor's behavior during the critical times.« less
Malleable architecture generator for FPGA computing
NASA Astrophysics Data System (ADS)
Gokhale, Maya; Kaba, James; Marks, Aaron; Kim, Jang
1996-10-01
The malleable architecture generator (MARGE) is a tool set that translates high-level parallel C to configuration bit streams for field-programmable logic based computing systems. MARGE creates an application-specific instruction set and generates the custom hardware components required to perform exactly those computations specified by the C program. In contrast to traditional fixed-instruction processors, MARGE's dynamic instruction set creation provides for efficient use of hardware resources. MARGE processes intermediate code in which each operation is annotated by the bit lengths of the operands. Each basic block (sequence of straight line code) is mapped into a single custom instruction which contains all the operations and logic inherent in the block. A synthesis phase maps the operations comprising the instructions into register transfer level structural components and control logic which have been optimized to exploit functional parallelism and function unit reuse. As a final stage, commercial technology-specific tools are used to generate configuration bit streams for the desired target hardware. Technology- specific pre-placed, pre-routed macro blocks are utilized to implement as much of the hardware as possible. MARGE currently supports the Xilinx-based Splash-2 reconfigurable accelerator and National Semiconductor's CLAy-based parallel accelerator, MAPA. The MARGE approach has been demonstrated on systolic applications such as DNA sequence comparison.
ISFET-based sensor signal processor chip design for environment monitoring applications
NASA Astrophysics Data System (ADS)
Chung, Wen-Yaw; Yang, Chung-Huang; Wang, Ming-Ga
2004-12-01
In recent years Ion-Sensitive Field Effect Transistor (ISFET) based transducers create valuable applications in physiological data acquisition and environment monitoring. This paper presents a mixed-mode ASIC design for potentiometric ISFET-based bio-chemical sensor applications including H+ sensing and hand-held pH meter. For battery power consideration, the proposed system consists of low voltage (3V) analog front-end readout circuits and digital processor has been developed and fabricated in a 0.5mm double-poly double-metal CMOS technology. To assure that the correct pH value can be measured, the two-point calibration circuitry based on the response of standard pH4 and pH7 buffer solution has been implemented by using algorithmic state machine hardware algorithms. The measurement accuracy of the chip is 10 bits and the measured range between pH 2 to pH 12 compared to ideal values is within the accuracy of 0.1pH. For homeland environmental applications, the system provide rapid, easy to use, and cost-effective on-site testing on the quality of water, such as drinking water, ground water and river water. The processor has a potential usage in battery-operated and portable devices in environmental monitoring applications compared to commercial hand-held pH meter.
Pawlowski, Marcin Piotr; Jara, Antonio; Ogorzalek, Maciej
2015-01-01
Entropy in computer security is associated with the unpredictability of a source of randomness. The random source with high entropy tends to achieve a uniform distribution of random values. Random number generators are one of the most important building blocks of cryptosystems. In constrained devices of the Internet of Things ecosystem, high entropy random number generators are hard to achieve due to hardware limitations. For the purpose of the random number generation in constrained devices, this work proposes a solution based on the least-significant bits concatenation entropy harvesting method. As a potential source of entropy, on-board integrated sensors (i.e., temperature, humidity and two different light sensors) have been analyzed. Additionally, the costs (i.e., time and memory consumption) of the presented approach have been measured. The results obtained from the proposed method with statistical fine tuning achieved a Shannon entropy of around 7.9 bits per byte of data for temperature and humidity sensors. The results showed that sensor-based random number generators are a valuable source of entropy with very small RAM and Flash memory requirements for constrained devices of the Internet of Things. PMID:26506357
Pawlowski, Marcin Piotr; Jara, Antonio; Ogorzalek, Maciej
2015-10-22
Entropy in computer security is associated with the unpredictability of a source of randomness. The random source with high entropy tends to achieve a uniform distribution of random values. Random number generators are one of the most important building blocks of cryptosystems. In constrained devices of the Internet of Things ecosystem, high entropy random number generators are hard to achieve due to hardware limitations. For the purpose of the random number generation in constrained devices, this work proposes a solution based on the least-significant bits concatenation entropy harvesting method. As a potential source of entropy, on-board integrated sensors (i.e., temperature, humidity and two different light sensors) have been analyzed. Additionally, the costs (i.e., time and memory consumption) of the presented approach have been measured. The results obtained from the proposed method with statistical fine tuning achieved a Shannon entropy of around 7.9 bits per byte of data for temperature and humidity sensors. The results showed that sensor-based random number generators are a valuable source of entropy with very small RAM and Flash memory requirements for constrained devices of the Internet of Things.
NASA Astrophysics Data System (ADS)
Studenny, John; Johnstone, Eric
1991-01-01
The acousto-optic spectrum analyzer has undergone a theoretical design review and a basic parameter tradeoff analysis has been performed. The main conclusion is that for the given scenario of a 55 dB dynamic range and for a one-second temporal resolution, a 3.9 MHz resolution is a reasonable compromise with respect to current technology. Additional configurations are suggested. Noise testing of the signal detection processor algorithm was conducted. Additive white Gaussian noise was introduced to pure data. As expected, the tradeoff was between algorithm sensitivity and false alarms. No additional algorithm improvements could be made. The algorithm was observed to be robust, provided that the noise floor was set at a proper level. The digitization scheme was mainly driven by hardware constraints. To implement an analog to digital conversion scheme that linearly covers a 55 dB dynamic range would require a minimum of 17 bits. The general consensus was that 17 bits would be untenable for very large scale integration.
NASA Technical Reports Server (NTRS)
Glover, R. D.
1983-01-01
The NASA Dryden Flight Research Facility has developed a microprocessor-based, user-programmable, general-purpose aircraft interrogation and display system (AIDS). The hardware and software of this ground-support equipment have been designed to permit diverse applications in support of aircraft digital flight-control systems and simulation facilities. AIDS is often employed to provide engineering-units display of internal digital system parameters during development and qualification testing. Such visibility into the system under test has proved to be a key element in the final qualification testing of aircraft digital flight-control systems. Three first-generation 8-bit units are now in service in support of several research aircraft projects, and user acceptance has been high. A second-generation design, extended AIDS (XAIDS), incorporating multiple 16-bit processors, is now being developed to support the forward swept wing aircraft project (X-29A). This paper outlines the AIDS concept, summarizes AIDS operational experience, and describes the planned XAIDS design and mechanization.
"Observation Obscurer" - Time Series Viewer, Editor and Processor
NASA Astrophysics Data System (ADS)
Andronov, I. L.
The program is described, which contains a set of subroutines suitable for East viewing and interactive filtering and processing of regularly and irregularly spaced time series. Being a 32-bit DOS application, it may be used as a default fast viewer/editor of time series in any compute shell ("commander") or in Windows. It allows to view the data in the "time" or "phase" mode, to remove ("obscure") or filter outstanding bad points; to make scale transformations and smoothing using few methods (e.g. mean with phase binning, determination of the statistically opti- mal number of phase bins; "running parabola" (Andronov, 1997, As. Ap. Suppl, 125, 207) fit and to make time series analysis using some methods, e.g. correlation, autocorrelation and histogram analysis: determination of extrema etc. Some features have been developed specially for variable star observers, e.g. the barycentric correction, the creation and fast analysis of "OC" diagrams etc. The manual for "hot keys" is presented. The computer code was compiled with a 32-bit Free Pascal (www.freepascal.org).
Gigaflop architecture, a hardware perspective
NASA Technical Reports Server (NTRS)
Feierbach, G. F.
1978-01-01
Any super computer built in the early 1980s will use components that are available by fall 1978. The architecture of such a system cannot depart radically from current super computers if the software experience painfully acquired from these computers in the 70's is to apply. Given the above constraints, 10 billion floating point operations per second (BFLOPS) are attainable and a problem memory of 512 million (64 bit) words could be supported by the technology of the time. In contrast to this, industry is likely to respond with commercially available machines with a performance of less than 150 MFLOPS. This is due to self-imposed constraints on the manufacturers to provide upward compatible architectures (same instruction set) and systems which can be sold in significant volumes. Since this computing speed is inadequate to meet the demands of computational fluid dynamics, a special processor is required. Issues which are felt to be significant in the pursuit of maximum compute capability in this special processor are discussed.
Blackcomb: Hardware-Software Co-design for Non-Volatile Memory in Exascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schreiber, Robert
Summary of technical results of Blackcomb Memory Devices We explored various different memory technologies (STTRAM, PCRAM, FeRAM, and ReRAM). The progress can be classified into three categories, below. Modeling and Tool Releases Various modeling tools have been developed over the last decade to help in the design of SRAM or DRAM-based memory hierarchies. To explore new design opportunities that NVM technologies can bring to the designers, we have developed similar high-level models for NVM, including PCRAMsim [Dong 2009], NVSim [Dong 2012], and NVMain [Poremba 2012]. NVSim is a circuit-level model for NVM performance, energy, and area estimation, which supports variousmore » NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies. On the other side, NVMain is a cycle accurate main memory simulator designed to simulate emerging nonvolatile memories at the architectural level. We have released these models as open source tools and provided contiguous support to them. We also proposed PS3-RAM, which is a fast, portable and scalable statistical STT-RAM reliability analysis model [Wen 2012]. Design Space Exploration and Optimization With the support of these models, we explore different device/circuit optimization techniques. For example, in [Niu 2012a] we studied the power reduction technique for the application of ECC scheme in ReRAM designs and proposed to use ECC code to relax the BER (Bit Error Rate) requirement of a single memory to improve the write energy consumption and latency for both 1T1R and cross-point ReRAM designs. In [Xu 2011], we proposed a methodology to design STT-RAM for different optimization goals such as read performance, write performance and write energy by leveraging the trade-off between write current and write time of MTJ. We also studied the tradeoffs in building a reliable crosspoint ReRAM array [Niu 2012b]. We have conducted an in depth analysis of the circuit and system level design implications of multi-layer cross-point Resistive RAM (MLCReRAM) from performance, power and reliability perspectives [Xu 2013]. The objective of this study is to understand the design trade-offs of this technology with respect to the MLC Phase Change Memory (MLCPCM).Our MLC ReRAM design at the circuit and system levels indicates that different resistance allocation schemes, programming strategies, peripheral designs, and material selections profoundly affect the area, latency, power, and reliability of MLC ReRAM. Based on this analysis, we conduct two case studies: first we compare MLC ReRAM design against MLC phase-change memory (PCM) and multi-layer cross-point ReRAM design, and point out why multi-level ReRAM is appealing; second we further explore the design space for MLC ReRAM. Architecture and Application We explored hybrid checkpointing using phase-change memory for future exascale systems [Dong 2011] and showed that the use of nonvolatile memory for local checkpointing significantly increases the number of faults covered by local checkpoints and reduces the probability of a global failure in the middle of a global checkpoint to less than 1%. We also proposed a technique called i2WAP to mitigate the write variations in NVM-based last-level cache for the improvement of the NVM lifetime [Wang 2013]. Our wear leveling technique attempts to work around the limitations of write endurance by arranging data access so that write operations can be distributed evenly across all the storage cells. During our intensive research on fault-tolerant NVM design, we found that ECC cannot effectively tolerate hard errors from limited write endurance and process imperfection. Therefore, we devised a novel Point and Discard (PAD) architecture in in [ 2012] as a hard-error-tolerant architecture for ReRAM-based Last Level Caches. PAD improves the lifetime of ReRAM caches by 1.6X-440X under different process variations without performance overhead in the system's early life. We have investigated the applicability of NVM for persistent memory design [Zhao 2013]. New byte addressable NVM enables fast persistent memory that allows in-memory persistent data objects to be updated with much higher throughput. Despite the significant improvement, the performance of these designs is only 50% of the native system with no persistence support, due to the logging or copy-on-write mechanisms used to update the persistent memory. A challenge in this approach is therefore how to efficiently enable atomic, consistent, and durable updates to ensure data persistence that survives application and/or system failures. We have designed a persistent memory system, called Klin, that can provide performance as close as that of the native system. The Klin design adopts a non-volatile cache and a non-volatile main memory for constructing a multi-versioned durable memory system, enabling atomic updates without logging or copy-on-write. Our evaluation shows that the proposed Kiln mechanism can achieve up to 2X of performance improvement to NVRAM-based persistent memory employing write-ahead logging. In addition, our design has numerous practical advantages: a simple and intuitive abstract interface, microarchitecture-level optimizations, fast recovery from failures, and no redundant writes to slow non-volatile storage media. The work was published in MICRO 2013 and received Best Paper Honorable Mentioned Award.« less
Monitoring complex detectors: the uSOP approach in the Belle II experiment
NASA Astrophysics Data System (ADS)
Di Capua, F.; Aloisio, A.; Ameli, F.; Anastasio, A.; Branchini, P.; Giordano, R.; Izzo, V.; Tortone, G.
2017-08-01
uSOP is a general purpose single board computer designed for deep embedded applications in control and monitoring of detectors, sensors and complex laboratory equipments. It is based on the AM3358 (1 GHz ARM Cortex A8 processor), equipped with USB and Ethernet interfaces. On-board RAM and solid state storage allows hosting a full LINUX distribution. In this paper we discuss the main aspects of the hardware and software design and the expandable peripheral architecture built around field busses. We report on several applications of uSOP system in the Belle II experiment, presently under construction at KEK (Tsukuba, Japan). In particular we will report the deployment of uSOP in the monitoring system framework of the endcap electromagnetic calorimeter.
Environment induced anomalies on the TDRS and the role of spacecraft charging
NASA Technical Reports Server (NTRS)
Garrett, H. B.; Whittlesey, A.; Daughtridge, S.
1990-01-01
The NASA Tracking and Data Relay Satellites (TDRS) have experienced several classes of anomalies that appear to be related to the natural environment. The most serious of these have been anomalies in the Attitude Control System control processor electronics which resulted in check sum errors that were ultimately traced to high-energy, particle-induced single event upsets in the RAM memory. Three other types of anomalies on TDRS have also been correlated with environmental effects. This paper briefly documents the occurrences of these anomalies and describes the nature of each. These events are correlated with various environmental factors. For all cases, there appears to be a causal relationship between spacecraft charging events and the engineering anomalies.
Jeon, Ji Hoon; Joo, Ho-Young; Kim, Young-Min; Lee, Duk Hyun; Kim, Jin-Soo; Kim, Yeon Soo; Choi, Taekjib; Park, Bae Ho
2016-01-01
Highly nonlinear bistable current-voltage (I–V) characteristics are necessary in order to realize high density resistive random access memory (ReRAM) devices that are compatible with cross-point stack structures. Up to now, such I–V characteristics have been achieved by introducing complex device structures consisting of selection elements (selectors) and memory elements which are connected in series. In this study, we report bipolar resistive switching (RS) behaviours of nano-crystalline BiFeO3 (BFO) nano-islands grown on Nb-doped SrTiO3 substrates, with large ON/OFF ratio of 4,420. In addition, the BFO nano-islands exhibit asymmetric I–V characteristics with high nonlinearity factor of 1,100 in a low resistance state. Such selector-free RS behaviours are enabled by the mosaic structures and pinned downward ferroelectric polarization in the BFO nano-islands. The high resistance ratio and nonlinearity factor suggest that our BFO nano-islands can be extended to an N × N array of N = 3,740 corresponding to ~107 bits. Therefore, our BFO nano-island showing both high resistance ratio and nonlinearity factor offers a simple and promising building block of high density ReRAM. PMID:27001415
NASA Technical Reports Server (NTRS)
Gregory, Kyle J.; Hill, Joanne E. (Editor); Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-01-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photoelectron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
NASA Astrophysics Data System (ADS)
Marinella, M.
In the not too distant future, the traditional memory and storage hierarchy of may be replaced by a single Storage Class Memory (SCM) device integrated on or near the logic processor. Traditional magnetic hard drives, NAND flash, DRAM, and higher level caches (L2 and up) will be replaced with a single high performance memory device. The Storage Class Memory paradigm will require high speed (< 100 ns read/write), excellent endurance (> 1012), nonvolatility (retention > 10 years), and low switching energies (< 10 pJ per switch). The International Technology Roadmap for Semiconductors (ITRS) has recently evaluated several potential candidates SCM technologies, including Resistive (or Redox) RAM, Spin Torque Transfer RAM (STT-MRAM), and phase change memory (PCM). All of these devices show potential well beyond that of current flash technologies and research efforts are underway to improve the endurance, write speeds, and scalabilities to be on-par with DRAM. This progress has interesting implications for space electronics: each of these emerging device technologies show excellent resistance to the types of radiation typically found in space applications. Commercially developed, high density storage class memory-based systems may include a memory that is physically radiation hard, and suitable for space applications without major shielding efforts. This paper reviews the Storage Class Memory concept, emerging memory devices, and possible applicability to radiation hardened electronics for space.
Feasibilty of a Multi-bit Cell Perpendicular Magnetic Tunnel Junction Device
NASA Astrophysics Data System (ADS)
Kim, Chang Soo
The ultimate objective of this research project was to explore the feasibility of making a multi-bit cell perpendicular magnetic tunnel junction (PMTJ) device to increase the storage density of spin-transfer-torque random access memory (STT-RAM). As a first step toward demonstrating a multi-bit cell device, this dissertation contributed a systematic and detailed study of developing a single cell PMTJ device using L10 FePt films. In the beginning of this research, 13 up-and-coming non-volatile memory (NVM) technologies were investigated and evaluated to see whether one of them might outperform NAND flash memories and even HDDs on a cost-per-TB basis in 2020. This evaluation showed that STT-RAM appears to potentially offer superior power efficiency, among other advantages. It is predicted that STTRAM's density could make it a promising candidate for replacing NAND flash memories and possibly HDDs if STTRAM could be improved to store multiple bits per cell. Ta/Mg0 under-layers were used first in order to develop (001) L1 0 ordering of FePt at a low temperature of below 400 °C. It was found that the tradeoff between surface roughness and (001) L10 ordering of FePt makes it difficult to achieve low surface roughness and good perpendicular magnetic properties simultaneously when Ta/Mg0 under-layers are used. It was, therefore, decided to investigate MgO/CrRu under-layers to simultaneously achieve smooth films with good ordering below 400°C. A well ordered 4 nm L10 FePt film with RMS surface roughness close to 0.4 nm, perpendicular coercivity of about 5 kOe, and perpendicular squareness near 1 was obtained at a deposition temperature of 390 °C on a thermally oxidized Si substrate when MgO/CrRu under-layers are used. A PMTJ device was developed by depositing a thin MgO tunnel barrier layer and a top L10 FePt film and then being postannealed at 450 °C for 30 minutes. It was found that the sputtering power needs to be minimized during the thin MgO tunnel barrier deposition because the high sputtering power can degrade perpendicular magnetic anisotropy of the bottom L1 0 FePt film and also increase RMS film surface roughness of the MgO tunnel barrier layer. From a lithographically unpatterned PMTJ sample, MR ratio and RA were measured at room temperature by the CIPT method and found to be 138% and 6.4 kOmicrom2, respectively. A completed PMTJ test pattern with a junction size of 80x40 microm2 was fabricated and showed a measured MR ratio and RA product of 108% and 4~6 kOmicrom 2, respectively. These values agree relatively well with the corresponding values of 138% and 6.4 kOmicrom2 obtained from the unpatterned PMTJ sample measured by a current-in-plane tunneling (CIPT) method.
Digital receiver study and implementation
NASA Technical Reports Server (NTRS)
Fogle, D. A.; Lee, G. M.; Massey, J. C.
1972-01-01
Computer software was developed which makes it possible to use any general purpose computer with A/D conversion capability as a PSK receiver for low data rate telemetry processing. Carrier tracking, bit synchronization, and matched filter detection are all performed digitally. To aid in the implementation of optimum computer processors, a study of general digital processing techniques was performed which emphasized various techniques for digitizing general analog systems. In particular, the phase-locked loop was extensively analyzed as a typical non-linear communication element. Bayesian estimation techniques for PSK demodulation were studied. A hardware implementation of the digital Costas loop was developed.
A DSP equipped digitizer for online analysis of nuclear detector signals
NASA Astrophysics Data System (ADS)
Pasquali, G.; Ciaranfi, R.; Bardelli, L.; Bini, M.; Boiano, A.; Giannelli, F.; Ordine, A.; Poggi, G.
2007-01-01
In the framework of the NUCL-EX collaboration, a DSP equipped fast digitizer has been implemented and it has now reached the production stage. Each sampling channel is implemented on a separate daughter-board to be plugged on a VME mother-board. Each channel features a 12-bit, 125 MSamples/s ADC and a Digital Signal Processor (DSP) for online analysis of detector signals. A few algorithms have been written and successfully tested on detectors of different types (scintillators, solid-state, gas-filled), implementing pulse shape discrimination, constant fraction timing, semi-Gaussian shaping, gated integration.
Sensor Authentication: Embedded Processor Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Svoboda, John
2012-09-25
Described is the c code running on the embedded Microchip 32bit PIC32MX575F256H located on the INL developed noise analysis circuit board. The code performs the following functions: Controls the noise analysis circuit board preamplifier voltage gains of 1, 10, 100, 000 Initializes the analog to digital conversion hardware, input channel selection, Fast Fourier Transform (FFT) function, USB communications interface, and internal memory allocations Initiates high resolution 4096 point 200 kHz data acquisition Computes complex 2048 point FFT and FFT magnitude. Services Host command set Transfers raw data to Host Transfers FFT result to host Communication error checking
Advanced flight computer. Special study
NASA Technical Reports Server (NTRS)
Coo, Dennis
1995-01-01
This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Design and implementation of projects with Xilinx Zynq FPGA: a practical case
NASA Astrophysics Data System (ADS)
Travaglini, R.; D'Antone, I.; Meneghini, S.; Rignanese, L.; Zuffa, M.
The main advantage when using FPGAs with embedded processors is the availability of additional several high-performance resources in the same physical device. Moreover, the FPGA programmability allows for connect custom peripherals. Xilinx have designed a programmable device named Zynq-7000 (simply called Zynq in the following), which integrates programmable logic (identical to the other Xilinx "serie 7" devices) with a System on Chip (SOC) based on two embedded ARM processors. Since both parts are deeply connected, the designers benefit from performance of hardware SOC and flexibility of programmability as well. In this paper a design developed by the Electronic Design Department at the Bologna Division of INFN will be presented as a practical case of project based on Zynq device. It is developed by using a commercial board called ZedBoard hosting a FMC mezzanine with a 12-bit 500 MS/s ADC. The Zynq FPGA on the ZedBoard receives digital outputs from the ADC and send them to the acquisition PC, after proper formatting, through a Gigabit Ethernet link. The major focus of the paper will be about the methodology to develop a Zynq-based design with the Xilinx Vivado software, enlightening how to configure the SOC and connect it with the programmable logic. Firmware design techniques will be presented: in particular both VHDL and IP core based strategies will be discussed. Further, the procedure to develop software for the embedded processor will be presented. Finally, some debugging tools, like the embedded Logic Analyzer, will be shown. Advantages and disadvantages with respect to adopting FPGA without embedded processors will be discussed.
A bunch to bucket phase detector for the RHIC LLRF upgrade platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, K.S.; Harvey, M.; Hayes, T.
2011-03-28
As part of the overall development effort for the RHIC LLRF Upgrade Platform [1,2,3], a generic four channel 16 bit Analog-to-Digital Converter (ADC) daughter module was developed to provide high speed, wide dynamic range digitizing and processing of signals from DC to several hundred megahertz. The first operational use of this card was to implement the bunch to bucket phase detector for the RHIC LLRF beam control feedback loops. This paper will describe the design and performance features of this daughter module as a bunch to bucket phase detector, and also provide an overview of its place within the overallmore » LLRF platform architecture as a high performance digitizer and signal processing module suitable to a variety of applications. In modern digital control and signal processing systems, ADCs provide the interface between the analog and digital signal domains. Once digitized, signals are then typically processed using algorithms implemented in field programmable gate array (FPGA) logic, general purpose processors (GPPs), digital signal processors (DSPs) or a combination of these. For the recently developed and commissioned RHIC LLRF Upgrade Platform, we've developed a four channel ADC daughter module based on the Linear Technology LTC2209 16 bit, 160 MSPS ADC and the Xilinx V5FX70T FPGA. The module is designed to be relatively generic in application, and with minimal analog filtering on board, is capable of processing signals from DC to 500 MHz or more. The module's first application was to implement the bunch to bucket phase detector (BTB-PD) for the RHIC LLRF system. The same module also provides DC digitizing of analog processed BPM signals used by the LLRF system for radial feedback.« less
Yu, Jen-Shiang K; Hwang, Jenn-Kang; Tang, Chuan Yi; Yu, Chin-Hui
2004-01-01
A number of recently released numerical libraries including Automatically Tuned Linear Algebra Subroutines (ATLAS) library, Intel Math Kernel Library (MKL), GOTO numerical library, and AMD Core Math Library (ACML) for AMD Opteron processors, are linked against the executables of the Gaussian 98 electronic structure calculation package, which is compiled by updated versions of Fortran compilers such as Intel Fortran compiler (ifc/efc) 7.1 and PGI Fortran compiler (pgf77/pgf90) 5.0. The ifc 7.1 delivers about 3% of improvement on 32-bit machines compared to the former version 6.0. Performance improved from pgf77 3.3 to 5.0 is also around 3% when utilizing the original unmodified optimization options of the compiler enclosed in the software. Nevertheless, if extensive compiler tuning options are used, the speed can be further accelerated to about 25%. The performances of these fully optimized numerical libraries are similar. The double-precision floating-point (FP) instruction sets (SSE2) are also functional on AMD Opteron processors operated in 32-bit compilation, and Intel Fortran compiler has performed better optimization. Hardware-level tuning is able to improve memory bandwidth by adjusting the DRAM timing, and the efficiency in the CL2 mode is further accelerated by 2.6% compared to that of the CL2.5 mode. The FP throughput is measured by simultaneous execution of two identical copies of each of the test jobs. Resultant performance impact suggests that IA64 and AMD64 architectures are able to fulfill significantly higher throughput than the IA32, which is consistent with the SpecFPrate2000 benchmarks.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes
NASA Astrophysics Data System (ADS)
Vincenti, H.; Lobet, M.; Lehe, R.; Sasanka, R.; Vay, J.-L.
2017-01-01
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈ 20 pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scatter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of × 2 to × 2.5 speed-up in double precision for particle shape factor of orders 1- 3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles).
1988-09-16
IEEE Trans. Electron Dev., 30, pp 1295 (1983) /2/ K.K. Ng and W.T. Lynch, IEEE Trans. Electron Dev., 34, pp 503 (1987). /3/ P.A. van der Plas, W.C.E...300 20 F Ohm I FRC I 500 170 Ohm INPNICje F 65 F 28 fI Cjc F 100 j 45 I fF hFE I 80 j 80 II IBVEB I 6.5 F 6.5 V BVCE I 6.5 6.5 vI BVCB F 18 I 18 (V I...evaluated include a -Xb RAM, a 1.6GHz universal shift register and an 8 bit DAC with a 1.2ns settling time. A micograph of the DAC is shown in figure 6
NASA Technical Reports Server (NTRS)
Hetsch, J.
1983-01-01
Intensity distributions in nonoptical wave fields can be visualized and stored on photosensitive material. In the case of microwaves, temperature effects can be utilized with the aid of liquid crystals to visualize intensity distributions. Particular advantages for the study of intensity distributions in microwave fields presents a scanning procedure in which a microcomputer is employed for the control of a probe and the storage of the measured data. The present investigation is concerned with the employment of such a scanning procedure for the recording and the reproduction of microwave holograms. The scanning procedure makes use of an approach discussed by Farhat, et al. (1973). An eight-bit microprocessor with 64 kBytes of RAM is employed together with a diskette storage system.
Pioneer Venus Orbiter planar retarding potential analyzer plasma experiment
NASA Technical Reports Server (NTRS)
Knudsen, W. C.; Bakke, J.; Spenner, K.; Novak, V.
1980-01-01
The retarding potential analyzer (RPA) on the Pioneer Venus Orbiter Mission measures most of the thermal plasma parameters within and near the Venusian ionosphere. Parameters include total ion concentration, concentrations of the more abundant ions, ion temperatures, ion drift velocity, electron temperature, and low-energy (0-50 eV) electron distribution function. Several functions not previously used in RPA's were developed and incorporated into this instrument to accomplish these measurements on a spinning spacecraft with a small bit rate. The more significant functions include automatic electrometer ranging with background current compensation; digital, quadratic retarding potential step generation for the ion and low-energy electron scans; a current sampling interval of 2 ms throughout all scans; digital logic inflection point detection and data selection; and automatic ram direction detection.
Realtime photoacoustic microscopy in vivo with a 30-MHz ultrasound array transducer.
Zemp, Roger J; Song, Liang; Bitton, Rachel; Shung, K Kirk; Wang, Lihong V
2008-05-26
We present a novel high-frequency photoacoustic microscopy system capable of imaging the microvasculature of living subjects in realtime to depths of a few mm. The system consists of a high-repetition-rate Q-switched pump laser, a tunable dye laser, a 30-MHz linear ultrasound array transducer, a multichannel high-frequency data acquisition system, and a shared-RAM multi-core-processor computer. Data acquisition, beamforming, scan conversion, and display are implemented in realtime at 50 frames per second. Clearly resolvable images of 6-microm-diameter carbon fibers are experimentally demonstrated at 80 microm separation distances. Realtime imaging performance is demonstrated on phantoms and in vivo with absorbing structures identified to depths of 2.5-3 mm. This work represents the first high-frequency realtime photoacoustic imaging system to our knowledge.
Software/hardware distributed processing network supporting the Ada environment
NASA Astrophysics Data System (ADS)
Wood, Richard J.; Pryk, Zen
1993-09-01
A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
FLY MPI-2: a parallel tree code for LSS
NASA Astrophysics Data System (ADS)
Becciani, U.; Comparato, M.; Antonuccio-Delogu, V.
2006-04-01
New version program summaryProgram title: FLY 3.1 Catalogue identifier: ADSC_v2_0 Licensing provisions: yes Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland No. of lines in distributed program, including test data, etc.: 158 172 No. of bytes in distributed program, including test data, etc.: 4 719 953 Distribution format: tar.gz Programming language: Fortran 90, C Computer: Beowulf cluster, PC, MPP systems Operating system: Linux, Aix RAM: 100M words Catalogue identifier of previous version: ADSC_v1_0 Journal reference of previous version: Comput. Phys. Comm. 155 (2003) 159 Does the new version supersede the previous version?: yes Nature of problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force Solution method: FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986) Reasons for the new version: The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations. Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block structure. Summary of revisions: The parallel communication schema was totally changed. The new version adopts the MPICH2 library. Now FLY can be executed on all Unix systems having an MPI-2 standard library. The main data structure, is declared in a module procedure of FLY (fly_h.F90 routine). FLY creates the MPI Window object for one-sided communication for all the shared arrays, with a call like the following: CALL MPI_WIN_CREATE(POS, SIZE, REAL8, MPI_INFO_NULL, MPI_COMM_WORLD, WIN_POS, IERR) the following main window objects are created: win_pos, win_vel, win_acc: particles positions velocities and accelerations, win_pos_cell, win_mass_cell, win_quad, win_subp, win_grouping: cells positions, masses, quadrupole momenta, tree structure and grouping cells. Other windows are created for dynamic load balance and global counters. Restrictions: The program uses the leapfrog integrator schema, but could be changed by the user. Unusual features: FLY uses the MPI-2 standard: the MPICH2 library on Linux systems was adopted. To run this version of FLY the working directory must be shared among all the processors that execute FLY. Additional comments: Full documentation for the program is included in the distribution in the form of a README file, a User Guide and a Reference manuscript. Running time: IBM Linux Cluster 1350, 512 nodes with 2 processors for each node and 2 GB RAM for each processor, at Cineca, was adopted to make performance tests. Processor type: Intel Xeon Pentium IV 3.0 GHz and 512 KB cache (128 nodes have Nocona processors). Internal Network: Myricom LAN Card "C" Version and "D" Version. Operating System: Linux SuSE SLES 8. The code was compiled using the mpif90 compiler version 8.1 and with basic optimization options in order to have performances that could be useful compared with other generic clusters Processors
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
A programmable two-qubit quantum processor in silicon
NASA Astrophysics Data System (ADS)
Watson, T. F.; Philips, S. G. J.; Kawakami, E.; Ward, D. R.; Scarlino, P.; Veldhorst, M.; Savage, D. E.; Lagally, M. G.; Friesen, Mark; Coppersmith, S. N.; Eriksson, M. A.; Vandersypen, L. M. K.
2018-03-01
Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch–Josza algorithm and the Grover search algorithm—canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85–89 per cent and concurrences of 73–82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.
A programmable two-qubit quantum processor in silicon.
Watson, T F; Philips, S G J; Kawakami, E; Ward, D R; Scarlino, P; Veldhorst, M; Savage, D E; Lagally, M G; Friesen, Mark; Coppersmith, S N; Eriksson, M A; Vandersypen, L M K
2018-03-29
Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch-Josza algorithm and the Grover search algorithm-canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85-89 per cent and concurrences of 73-82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
NASA Astrophysics Data System (ADS)
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
NASA Astrophysics Data System (ADS)
Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa
2017-08-01
The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing
(KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
Pesce, Lorenzo L.; Lee, Hyong C.; Hereld, Mark; ...
2013-01-01
Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determinedmore » the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.« less
Radiation effects and mitigation strategies for modern FPGAs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stettler, M. W.; Caffrey, M. P.; Graham, P. S.
2004-01-01
Field Programmable Gate Array devices have become the technology of choice in small volume modern instrumentation and control systems. These devices have always offered significant advantages in flexibility, and recent advances in fabrication have greatly increased logic capacity, substantially increasing the number of applications for this technology. Unfortunately, the increased density (and corresponding shrinkage of process geometry), has made these devices more susceptible to failure due to external radiation. This has been an issue for space based systems for some time, but is now becoming an issue for terrestrial systems in elevated radiation environments and commercial avionics as well. Characterizingmore » the failure modes of Xilinx FPGAs, and developing mitigation strategies is the subject of ongoing research by a consortium of academic, industrial, and governmental laboratories. This paper presents background information of radiation effects and failure modes, as well as current and future mitigation techniques. In particular, the availability of very large FPGA devices, complete with generous amounts of RAM and embedded processor(s), has led to the implementation of complete digital systems on a single device, bringing issues of system reliability and redundancy management to the chip level. Radiation effects on a single FPGA are increasingly likely to have system level consequences, and will need to be addressed in current and future designs.« less
Application of an onboard processor to the OAO C spacecraft
NASA Technical Reports Server (NTRS)
Stewart, W. N.; Hartenstein, R. G.; Trevathan, C.
1972-01-01
The design of a stored program computer for spacecraft use and its application on the fourth Orbiting Astronomical Observatory (OAO) is reported. The computer is a medium scale, parallel machine with a memory capacity of 16384 words of 18 bits each. It possesses a comprehensive instruction repertoire and operates on 45 W of power (including the dc-to-dc converter). The machine operates at a 500-kHz rate and executes an add instruction in 10 microseconds. Its primary functions on OAO C will be auxiliary command storage, spacecraft monitoring and malfunction reporting, data compression and status summary, and possible performance of emergency corrective action for certain anomalous situations.
A computational system for lattice QCD with overlap Dirac quarks
NASA Astrophysics Data System (ADS)
Chiu, Ting-Wai; Hsieh, Tung-Han; Huang, Chao-Hsi; Huang, Tsung-Ren
2003-05-01
We outline the essential features of a Linux PC cluster which is now being developed at National Taiwan University, and discuss how to optimize its hardware and software for lattice QCD with overlap Dirac quarks. At present, the cluster constitutes of 30 nodes, with each node consisting of one Pentium 4 processor (1.6/2.0 GHz), one Gbyte of PC800 RDRAM, one 40/80 Gbyte hard disk, and a network card. The speed of this system is estimated to be 30 Gflops, and its price/performance ratio is better than $1.0/Mflops for 64-bit (double precision) computations in quenched lattice QCD with overlap Dirac quarks.
Apparatus and method for implementing power saving techniques when processing floating point values
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Young Moon; Park, Sang Phill
An apparatus and method are described for reducing power when reading and writing graphics data. For example, one embodiment of an apparatus comprises: a graphics processor unit (GPU) to process graphics data including floating point data; a set of registers, at least one of the registers of the set partitioned to store the floating point data; and encode/decode logic to reduce a number of binary 1 values being read from the at least one register by causing a specified set of bit positions within the floating point data to be read out as 0s rather than 1s.
Hardware/Software Issues for Video Guidance Systems: The Coreco Frame Grabber
NASA Technical Reports Server (NTRS)
Bales, John W.
1996-01-01
The F64 frame grabber is a high performance video image acquisition and processing board utilizing the TMS320C40 and TMS34020 processors. The hardware is designed for the ISA 16 bit bus and supports multiple digital or analog cameras. It has an acquisition rate of 40 million pixels per second, with a variable sampling frequency of 510 kHz to MO MHz. The board has a 4MB frame buffer memory expandable to 32 MB, and has a simultaneous acquisition and processing capability. It supports both VGA and RGB displays, and accepts all analog and digital video input standards.
Optimizing TLB entries for mixed page size storage in contiguous memory
Chen, Dong; Gara, Alan; Giampapa, Mark E.; Heidelberger, Philip; Kriegel, Jon K.; Ohmacht, Martin; Steinmacher-Burow, Burkhard
2013-04-30
A system and method for accessing memory are provided. The system comprises a lookup buffer for storing one or more page table entries, wherein each of the one or more page table entries comprises at least a virtual page number and a physical page number; a logic circuit for receiving a virtual address from said processor, said logic circuit for matching the virtual address to the virtual page number in one of the page table entries to select the physical page number in the same page table entry, said page table entry having one or more bits set to exclude a memory range from a page.
Memory Network For Distributed Data Processors
NASA Technical Reports Server (NTRS)
Bolen, David; Jensen, Dean; Millard, ED; Robinson, Dave; Scanlon, George
1992-01-01
Universal Memory Network (UMN) is modular, digital data-communication system enabling computers with differing bus architectures to share 32-bit-wide data between locations up to 3 km apart with less than one millisecond of latency. Makes it possible to design sophisticated real-time and near-real-time data-processing systems without data-transfer "bottlenecks". This enterprise network permits transmission of volume of data equivalent to an encyclopedia each second. Facilities benefiting from Universal Memory Network include telemetry stations, simulation facilities, power-plants, and large laboratories or any facility sharing very large volumes of data. Main hub of UMN is reflection center including smaller hubs called Shared Memory Interfaces.
Experiments with a small behaviour controlled planetary rover
NASA Technical Reports Server (NTRS)
Miller, David P.; Desai, Rajiv S.; Gat, Erann; Ivlev, Robert; Loch, John
1993-01-01
A series of experiments that were performed on the Rocky 3 robot is described. Rocky 3 is a small autonomous rover capable of navigating through rough outdoor terrain to a predesignated area, searching that area for soft soil, acquiring a soil sample, and depositing the sample in a container at its home base. The robot is programmed according to a reactive behavior control paradigm using the ALFA programming language. This style of programming produces robust autonomous performance while requiring significantly less computational resources than more traditional mobile robot control systems. The code for Rocky 3 runs on an eight bit processor and uses about ten k of memory.
Chip-integrated optical power limiter based on an all-passive micro-ring resonator
NASA Astrophysics Data System (ADS)
Yan, Siqi; Dong, Jianji; Zheng, Aoling; Zhang, Xinliang
2014-10-01
Recent progress in silicon nanophotonics has dramatically advanced the possible realization of large-scale on-chip optical interconnects integration. Adopting photons as information carriers can break the performance bottleneck of electronic integrated circuit such as serious thermal losses and poor process rates. However, in integrated photonics circuits, few reported work can impose an upper limit of optical power therefore prevent the optical device from harm caused by high power. In this study, we experimentally demonstrate a feasible integrated scheme based on a single all-passive micro-ring resonator to realize the optical power limitation which has a similar function of current limiting circuit in electronics. Besides, we analyze the performance of optical power limiter at various signal bit rates. The results show that the proposed device can limit the signal power effectively at a bit rate up to 20 Gbit/s without deteriorating the signal. Meanwhile, this ultra-compact silicon device can be completely compatible with the electronic technology (typically complementary metal-oxide semiconductor technology), which may pave the way of very large scale integrated photonic circuits for all-optical information processors and artificial intelligence systems.
Dolphin Sounds-Inspired Covert Underwater Acoustic Communication and Micro-Modem
Qiao, Gang; Liu, Songzuo; Bilal, Muhammad
2017-01-01
A novel portable underwater acoustic modem is proposed in this paper for covert communication between divers or underwater unmanned vehicles (UUVs) and divers at a short distance. For the first time, real dolphin calls are used in the modem to realize biologically inspired Covert Underwater Acoustic Communication (CUAC). A variety of dolphin whistles and clicks stored in an SD card inside the modem helps to realize different biomimetic CUAC algorithms based on the specified covert scenario. In this paper, the information is conveyed during the time interval between dolphin clicks. TMS320C6748 and TLV320AIC3106 are the core processors used in our unique modem for fast digital processing and interconnection with other terminals or sensors. Simulation results show that the bit error rate (BER) of the CUAC algorithm is less than 10−5 when the signal to noise ratio is over ‒5 dB. The modem was tested in an underwater pool, and a data rate of 27.1 bits per second at a distance of 10 m was achieved. PMID:29068363
Solving Coupled Gross--Pitaevskii Equations on a Cluster of PlayStation 3 Computers
NASA Astrophysics Data System (ADS)
Edwards, Mark; Heward, Jeffrey; Clark, C. W.
2009-05-01
At Georgia Southern University we have constructed an 8+1--node cluster of Sony PlayStation 3 (PS3) computers with the intention of using this computing resource to solve problems related to the behavior of ultra--cold atoms in general with a particular emphasis on studying bose--bose and bose--fermi mixtures confined in optical lattices. As a first project that uses this computing resource, we have implemented a parallel solver of the coupled time--dependent, one--dimensional Gross--Pitaevskii (TDGP) equations. These equations govern the behavior of dual-- species bosonic mixtures. We chose the split--operator/FFT to solve the coupled 1D TDGP equations. The fast Fourier transform component of this solver can be readily parallelized on the PS3 cpu known as the Cell Broadband Engine (CellBE). Each CellBE chip contains a single 64--bit PowerPC Processor Element known as the PPE and eight ``Synergistic Processor Element'' identified as the SPE's. We report on this algorithm and compare its performance to a non--parallel solver as applied to modeling evaporative cooling in dual--species bosonic mixtures.
Image processing using Gallium Arsenide (GaAs) technology
NASA Technical Reports Server (NTRS)
Miller, Warner H.
1989-01-01
The need to increase the information return from space-borne imaging systems has increased in the past decade. The use of multi-spectral data has resulted in the need for finer spatial resolution and greater spectral coverage. Onboard signal processing will be necessary in order to utilize the available Tracking and Data Relay Satellite System (TDRSS) communication channel at high efficiency. A generally recognized approach to the increased efficiency of channel usage is through data compression techniques. The compression technique implemented is a differential pulse code modulation (DPCM) scheme with a non-uniform quantizer. The need to advance the state-of-the-art of onboard processing was recognized and a GaAs integrated circuit technology was chosen. An Adaptive Programmable Processor (APP) chip set was developed which is based on an 8-bit slice general processor. The reason for choosing the compression technique for the Multi-spectral Linear Array (MLA) instrument is described. Also a description is given of the GaAs integrated circuit chip set which will demonstrate that data compression can be performed onboard in real time at data rate in the order of 500 Mb/s.
Xu, Qun; Wang, Xianchao; Xu, Chao
2017-06-01
Multiplication with traditional electronic computers is faced with a low calculating accuracy and a long computation time delay. To overcome these problems, the modified signed digit (MSD) multiplication routine is established based on the MSD system and the carry-free adder. Also, its parallel algorithm and optimization techniques are studied in detail. With the help of a ternary optical computer's characteristics, the structured data processor is designed especially for the multiplication routine. Several ternary optical operators are constructed to perform M transformations and summations in parallel, which has accelerated the iterative process of multiplication. In particular, the routine allocates data bits of the ternary optical processor based on digits of multiplication input, so the accuracy of the calculation results can always satisfy the users. Finally, the routine is verified by simulation experiments, and the results are in full compliance with the expectations. Compared with an electronic computer, the MSD multiplication routine is not only good at dealing with large-value data and high-precision arithmetic, but also maintains lower power consumption and fewer calculating delays.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Adaptive Instrument Module: Space Instrument Controller "Brain" through Programmable Logic Devices
NASA Technical Reports Server (NTRS)
Darrin, Ann Garrison; Conde, Richard; Chern, Bobbie; Luers, Phil; Jurczyk, Steve; Mills, Carl; Day, John H. (Technical Monitor)
2001-01-01
The Adaptive Instrument Module (AIM) will be the first true demonstration of reconfigurable computing with field-programmable gate arrays (FPGAs) in space, enabling the 'brain' of the system to evolve or adapt to changing requirements. In partnership with NASA Goddard Space Flight Center and the Australian Cooperative Research Centre for Satellite Systems (CRC-SS), APL has built the flight version to be flown on the Australian university-class satellite FEDSAT. The AIM provides satellites the flexibility to adapt to changing mission requirements by reconfiguring standardized processing hardware rather than incurring the large costs associated with new builds. This ability to reconfigure the processing in response to changing mission needs leads to true evolveable computing, wherein the instrument 'brain' can learn from new science data in order to perform state-of-the-art data processing. The development of the AIM is significant in its enormous potential to reduce total life-cycle costs for future space exploration missions. The advent of RAM-based FPGAs whose configuration can be changed at any time has enabled the development of the AIM for processing tasks that could not be performed in software. The use of the AIM enables reconfiguration of the FPGA circuitry while the spacecraft is in flight, with many accompanying advantages. The AIM demonstrates the practicalities of using reconfigurable computing hardware devices by conducting a series of designed experiments. These include the demonstration of implementing data compression, data filtering, and communication message processing and inter-experiment data computation. The second generation is the Adaptive Processing Template (ADAPT) which is further described in this paper. The next step forward is to make the hardware itself adaptable and the ADAPT pursues this challenge by developing a reconfigurable module that will be capable of functioning efficiently in various applications. ADAPT will take advantage of radiation tolerant RAM-based field programmable gate array (FPGA) technology to develop a reconfigurable processor that combines the flexibility of a general purpose processor running software with the performance of application specific processing hardware for a variety of high performance computing applications.
NASA Astrophysics Data System (ADS)
Hoefflinger, Bernd
Memories have been the major yardstick for the continuing validity of Moore's law. In single-transistor-per-Bit dynamic random-access memories (DRAM), the number of bits per chip pretty much gives us the number of transistors. For decades, DRAM's have offered the largest storage capacity per chip. However, DRAM does not scale any longer, both in density and voltage, severely limiting its power efficiency to 10 fJ/b. A differential DRAM would gain four-times in density and eight-times in energy. Static CMOS RAM (SRAM) with its six transistors/cell is gaining in reputation because it scales well in cell size and operating voltage so that its fundamental advantage of speed, non-destructive read-out and low-power standby could lead to just 2.5 electrons/bit in standby and to a dynamic power efficiency of 2aJ/b. With a projected 2020 density of 16 Gb/cm², the SRAM would be as dense as normal DRAM and vastly better in power efficiency, which would mean a major change in the architecture and market scenario for DRAM versus SRAM. Non-volatile Flash memory have seen two quantum jumps in density well beyond the roadmap: Multi-Bit storage per transistor and high-density TSV (through-silicon via) technology. The number of electrons required per Bit on the storage gate has been reduced since their first realization in 1996 by more than an order of magnitude to 400 electrons/Bit in 2010 for a complexity of 32Gbit per chip at the 32 nm node. Chip stacking of eight chips with TSV has produced a 32GByte solid-state drive (SSD). A stack of 32 chips with 2 b/cell at the 16 nm node will reach a density of 2.5 Terabit/cm². Non-volatile memory with a density of 10 × 10 nm²/Bit is the target for widespread development. Phase-change memory (PCM) and resistive memory (RRAM) lead in cell density, and they will reach 20 Gb/cm² in 2D and higher with 3D chip stacking. This is still almost an order-of-magnitude less than Flash. However, their read-out speed is ~10-times faster, with as yet little data on their energy/b. As a read-out memory with unparalleled retention and lifetime, the ROM with electron-beam direct-write-lithography (Chap. 8) should be considered for its projected 2D density of 250 Gb/cm², a very small read energy of 0.1 μW/Gb/s. The lithography write-speed 10 ms/Terabit makes this ROM a serious contentender for the optimum in non-volatile, tamper-proof storage.
Parallel implementation of an adaptive and parameter-free N-body integrator
NASA Astrophysics Data System (ADS)
Pruett, C. David; Ingham, William H.; Herman, Ralph D.
2011-05-01
Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Research on SEU hardening of heterogeneous Dual-Core SoC
NASA Astrophysics Data System (ADS)
Huang, Kun; Hu, Keliu; Deng, Jun; Zhang, Tao
2017-08-01
The implementation of Single-Event Upsets (SEU) hardening has various schemes. However, some of them require a lot of human, material and financial resources. This paper proposes an easy scheme on SEU hardening for Heterogeneous Dual-core SoC (HD SoC) which contains three techniques. First, the automatic Triple Modular Redundancy (TMR) technique is adopted to harden the register heaps of the processor and the instruction-fetching module. Second, Hamming codes are used to harden the random access memory (RAM). Last, a software signature technique is applied to check the programs which are running on CPU. The scheme need not to consume additional resources, and has little influence on the performance of CPU. These technologies are very mature, easy to implement and needs low cost. According to the simulation result, the scheme can satisfy the basic demand of SEU-hardening.
Solving the corner-turning problem for large interferometers
NASA Astrophysics Data System (ADS)
Lutomirski, Andrew; Tegmark, Max; Sanchez, Nevada J.; Stein, Leo C.; Urry, W. Lynn; Zaldarriaga, Matias
2011-01-01
The so-called corner-turning problem is a major bottleneck for radio telescopes with large numbers of antennas. The problem is essentially that of rapidly transposing a matrix that is too large to store on one single device; in radio interferometry, it occurs because data from each antenna need to be routed to an array of processors each of which will handle a limited portion of the data (say, a frequency range) but requires input from each antenna. We present a low-cost solution allowing the correlator to transpose its data in real time, without contending for bandwidth, via a butterfly network requiring neither additional RAM memory nor expensive general-purpose switching hardware. We discuss possible implementations of this using FPGA, CMOS, analog logic and optical technology, and conclude that the corner-turner cost can be small even for upcoming massive radio arrays.
Realtime photoacoustic microscopy in vivo with a 30-MHz ultrasound array transducer
Zemp, Roger J.; Song, Liang; Bitton, Rachel; Shung, K. Kirk; Wang, Lihong V.
2009-01-01
We present a novel high-frequency photoacoustic microscopy system capable of imaging the microvasculature of living subjects in realtime to depths of a few mm. The system consists of a high-repetition-rate Q-switched pump laser, a tunable dye laser, a 30-MHz linear ultrasound array transducer, a multichannel high-frequency data acquisition system, and a shared-RAM multi-core-processor computer. Data acquisition, beamforming, scan conversion, and display are implemented in realtime at 50 frames per second. Clearly resolvable images of 6-µm-diameter carbon fibers are experimentally demonstrated at 80 µm separation distances. Realtime imaging performance is demonstrated on phantoms and in vivo with absorbing structures identified to depths of 2.5–3 mm. This work represents the first high-frequency realtime photoacoustic imaging system to our knowledge. PMID:18545502
Analysis of ramming settlement based on dissipative principle
NASA Astrophysics Data System (ADS)
Fu, Hao; Yu, Kaining; Chen, Changli; Li, Changrong; Wang, Xiuli
2018-03-01
The deformation of soil is a kind of dissipative structure under the action of dynamic compaction. The macroscopic performance of soil to steady state evolution is the change of ramming settlement in the process of dynamic compaction. based on the existing solution of dynamic compaction boundary problem, calculated ramming effectiveness (W) and ramming efficiency coefficient( η ). For the same soil, ramming efficiency coefficient is related to ramming factor λ = M/ρr3. By using the dissipative principle to analyze the law between ramming settlements and ramming times under different ramming energy and soil density, come to the conclusion that: Firstly, with the increase of ramming numbers, ramming settlement tends to a stable value, ramming effectiveness coefficient tends to a stable value. Secondly, under the condition of the same single ramming energy, the soil density of before ramming has effect on ramming effectiveness of previous ramming, almost no effect on ramming effectiveness of subsequent ramming. Thirdly, under the condition of the same soil density, different ramming energy correspond to different steady-state, the cumulative ramming settlement and steady-state increase with ramming energy.
SHABERTH - ANALYSIS OF A SHAFT BEARING SYSTEM (CRAY VERSION)
NASA Technical Reports Server (NTRS)
Coe, H. H.
1994-01-01
The SHABERTH computer program was developed to predict operating characteristics of bearings in a multibearing load support system. Lubricated and non-lubricated bearings can be modeled. SHABERTH calculates the loads, torques, temperatures, and fatigue life for ball and/or roller bearings on a single shaft. The program also allows for an analysis of the system reaction to the termination of lubricant supply to the bearings and other lubricated mechanical elements. SHABERTH has proven to be a valuable tool in the design and analysis of shaft bearing systems. The SHABERTH program is structured with four nested calculation schemes. The thermal scheme performs steady state and transient temperature calculations which predict system temperatures for a given operating state. The bearing dimensional equilibrium scheme uses the bearing temperatures, predicted by the temperature mapping subprograms, and the rolling element raceway load distribution, predicted by the bearing subprogram, to calculate bearing diametral clearance for a given operating state. The shaft-bearing system load equilibrium scheme calculates bearing inner ring positions relative to the respective outer rings such that the external loading applied to the shaft is brought into equilibrium by the rolling element loads which develop at each bearing inner ring for a given operating state. The bearing rolling element and cage load equilibrium scheme calculates the rolling element and cage equilibrium positions and rotational speeds based on the relative inner-outer ring positions, inertia effects, and friction conditions. The ball bearing subprograms in the current SHABERTH program have several model enhancements over similar programs. These enhancements include an elastohydrodynamic (EHD) film thickness model that accounts for thermal heating in the contact area and lubricant film starvation; a new model for traction combined with an asperity load sharing model; a model for the hydrodynamic rolling and shear forces in the inlet zone of lubricated contacts, which accounts for the degree of lubricant film starvation; modeling normal and friction forces between a ball and a cage pocket, which account for the transition between the hydrodynamic and elastohydrodynamic regimes of lubrication; and a model of the effect on fatigue life of the ratio of the EHD plateau film thickness to the composite surface roughness. SHABERTH is intended to be as general as possible. The models in SHABERTH allow for the complete mathematical simulation of real physical systems. Systems are limited to a maximum of five bearings supporting the shaft, a maximum of thirty rolling elements per bearing, and a maximum of one hundred temperature nodes. The SHABERTH program structure is modular and has been designed to permit refinement and replacement of various component models as the need and opportunities develop. A preprocessor is included in the IBM PC version of SHABERTH to provide a user friendly means of developing SHABERTH models and executing the resulting code. The preprocessor allows the user to create and modify data files with minimal effort and a reduced chance for errors. Data is utilized as it is entered; the preprocessor then decides what additional data is required to complete the model. Only this required information is requested. The preprocessor can accommodate data input for any SHABERTH compatible shaft bearing system model. The system may include ball bearings, roller bearings, and/or tapered roller bearings. SHABERTH is written in FORTRAN 77, and two machine versions are available from COSMIC. The CRAY version (LEW-14860) has a RAM requirement of 176K of 64 bit words. The IBM PC version (MFS-28818) is written for IBM PC series and compatible computers running MS-DOS, and includes a sample MS-DOS executable. For execution, the PC version requires at least 1Mb of RAM and an 80386 or 486 processor machine with an 80x87 math co-processor. The standard distribution medium for the IBM PC version is a set of two 5.25 inch 360K MS-DOS format diskettes. The contents of the diske
The Photon Shell Game and the Quantum von Neumann Architecture with Superconducting Circuits
NASA Astrophysics Data System (ADS)
Mariantoni, Matteo
2012-02-01
Superconducting quantum circuits have made significant advances over the past decade, allowing more complex and integrated circuits that perform with good fidelity. We have recently implemented a machine comprising seven quantum channels, with three superconducting resonators, two phase qubits, and two zeroing registers. I will explain the design and operation of this machine, first showing how a single microwave photon | 1 > can be prepared in one resonator and coherently transferred between the three resonators. I will also show how more exotic states such as double photon states | 2 > and superposition states | 0 >+ | 1 > can be shuffled among the resonators as well [1]. I will then demonstrate how this machine can be used as the quantum-mechanical analog of the von Neumann computer architecture, which for a classical computer comprises a central processing unit and a memory holding both instructions and data. The quantum version comprises a quantum central processing unit (quCPU) that exchanges data with a quantum random-access memory (quRAM) integrated on one chip, with instructions stored on a classical computer. I will also present a proof-of-concept demonstration of a code that involves all seven quantum elements: (1), Preparing an entangled state in the quCPU, (2), writing it to the quRAM, (3), preparing a second state in the quCPU, (4), zeroing it, and, (5), reading out the first state stored in the quRAM [2]. Finally, I will demonstrate that the quantum von Neumann machine provides one unit cell of a two-dimensional qubit-resonator array that can be used for surface code quantum computing. This will allow the realization of a scalable, fault-tolerant quantum processor with the most forgiving error rates to date. [4pt] [1] M. Mariantoni et al., Nature Physics 7, 287-293 (2011.)[0pt] [2] M. Mariantoni et al., Science 334, 61-65 (2011).
Multi-processor including data flow accelerator module
Davidson, George S.; Pierce, Paul E.
1990-01-01
An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
SAWdoubler: A program for counting self-avoiding walks
NASA Astrophysics Data System (ADS)
Schram, Raoul D.; Barkema, Gerard T.; Bisseling, Rob H.
2013-03-01
This article presents SAWdoubler, a package for counting the total number ZN of self-avoiding walks (SAWs) on a regular lattice by the length-doubling method, of which the basic concept has been published previously by us. We discuss an algorithm for the creation of all SAWs of length N, efficient storage of these SAWs in a tree data structure, and an algorithm for the computation of correction terms to the count Z2N for SAWs of double length, removing all combinations of two intersecting single-length SAWs. We present an efficient numbering of the lattice sites that enables exploitation of symmetry and leads to a smaller tree data structure; this numbering is by increasing Euclidean distance from the origin of the lattice. Furthermore, we show how the computation can be parallelised by distributing the iterations of the main loop of the algorithm over the cores of a multicore architecture. Experimental results on the 3D cubic lattice demonstrate that Z28 can be computed on a dual-core PC in only 1 h and 40 min, with a speedup of 1.56 compared to the single-core computation and with a gain by using symmetry of a factor of 26. We present results for memory use and show how the computation is made to fit in 4 GB RAM. It is easy to extend the SAWdoubler software to other lattices; it is publicly available under the GNU LGPL license. Catalogue identifier: AEOB_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOB_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public Licence No. of lines in distributed program, including test data, etc.: 2101 No. of bytes in distributed program, including test data, etc.: 19816 Distribution format: tar.gz Programming language: C. Computer: Any computer with a UNIX-like operating system and a C compiler. For large problems, use is made of specific 128-bit integer arithmetic provided by the gcc compiler. Operating system: Any UNIX-like system; developed under Linux and Mac OS 10. Has the code been vectorised or parallelised?: Yes. A parallel version of the code is available in the “Extras” directory of the distribution file. RAM: Problem dependent (2 GB for counting SAWs of length 28 on the 3D cubic lattice) Classification: 16.11. Nature of problem: Computing the number of self-avoiding walks of a given length on a given lattice. Solution method: Length-doubling. Restrictions: The length of the walk must be even. Lattice is 3D simple cubic. Additional comments: The lattice can be replaced by other lattices, such as BCC, FCC, or a 2D square lattice. Running time: Problem dependent (2.5 h using one processor core for length 28 on the 3D cubic lattice).
The Cortex Transform as an image preprocessor for sparse distributed memory: An initial study
NASA Technical Reports Server (NTRS)
Olshausen, Bruno; Watson, Andrew
1990-01-01
An experiment is described which was designed to evaluate the use of the Cortex Transform as an image processor for Sparse Distributed Memory (SDM). In the experiment, a set of images were injected with Gaussian noise, preprocessed with the Cortex Transform, and then encoded into bit patterns. The various spatial frequency bands of the Cortex Transform were encoded separately so that they could be evaluated based on their ability to properly cluster patterns belonging to the same class. The results of this study indicate that by simply encoding the low pass band of the Cortex Transform, a very suitable input representation for the SDM can be achieved.
High-Speed Micro Signal Processor.
1982-06-01
4) 29 S1424-35 ADOR/QUANTITY FIELD - 2048 TO +409510, NOTE: POSITIVE VALUES 2047 4AP TO NEGKTIVE VALUES SK436 HALT 0 E ENABLE HALT 1 D DISABLE SM37 38...0 to 31 numeric. QUANT - 12 bit Quantity field to be used as the ALU ’Am operand. Value - (- 2048 to +2047)10 numeric. If (ASEL - Q16E) then, one...15AI12 1 9 rS EAO 1 41 1A!)3 I 10 TSEBO I 42 DTII1 I 11 TSEBI I 43 DTTIO T 12 VDI 1 44 )1109 1 13 DTAOO 0 45 DTI08 I 14 DTBOO 0 46 DTI07 I 15 DTBO1 0
Optical reversible programmable Boolean logic unit.
Chattopadhyay, Tanay
2012-07-20
Computing with reversibility is the only way to avoid dissipation of energy associated with bit erase. So, a reversible microprocessor is required for future computing. In this paper, a design of a simple all-optical reversible programmable processor is proposed using a polarizing beam splitter, liquid crystal-phase spatial light modulators, a half-wave plate, and plane mirrors. This circuit can perform 16 logical operations according to three programming inputs. Also, inputs can be easily recovered from the outputs. It is named the "reversible programmable Boolean logic unit (RPBLU)." The logic unit is the basic building block of many complex computational operations. Hence the design is important in sense. Two orthogonally polarized lights are defined here as two logical states, respectively.
The development of a specialized processor for a space-based multispectral earth imager
NASA Astrophysics Data System (ADS)
Khedr, Mostafa E.
2008-10-01
This work was done in the Department of Computer Engineering, Lvov Polytechnic National University, Lvov, Ukraine, as a thesis entitled "Space Imager Computer System for Raw Video Data Processing" [1]. This work describes the synthesis and practical implementation of a specialized computer system for raw data control and processing onboard a satellite MultiSpectral earth imager. This computer system is intended for satellites with resolution in the range of one meter with 12-bit precession. The design is based mostly on general off-the-shelf components such as (FPGAs) plus custom designed software for interfacing with PC and test equipment. The designed system was successfully manufactured and now fully functioning in orbit.
Design of a modular digital computer system, CDRL no. D001, final design plan
NASA Technical Reports Server (NTRS)
Easton, R. A.
1975-01-01
The engineering breadboard implementation for the CDRL no. D001 modular digital computer system developed during design of the logic system was documented. This effort followed the architecture study completed and documented previously, and was intended to verify the concepts of a fault tolerant, automatically reconfigurable, modular version of the computer system conceived during the architecture study. The system has a microprogrammed 32 bit word length, general register architecture and an instruction set consisting of a subset of the IBM System 360 instruction set plus additional fault tolerance firmware. The following areas were covered: breadboard packaging, central control element, central processing element, memory, input/output processor, and maintenance/status panel and electronics.
A digitally implemented preambleless demodulator for maritime and mobile data communications
NASA Astrophysics Data System (ADS)
Chalmers, Harvey; Shenoy, Ajit; Verahrami, Farhad B.
The hardware design and software algorithms for a low-bit-rate, low-cost, all-digital preambleless demodulator are described. The demodulator operates under severe high-noise conditions, fast Doppler frequency shifts, large frequency offsets, and multipath fading. Sophisticated algorithms, including a fast Fourier transform (FFT)-based burst acquisition algorithm, a cycle-slip resistant carrier phase tracker, an innovative Doppler tracker, and a fast acquisition symbol synchronizer, were developed and extensively simulated for reliable burst reception. The compact digital signal processor (DSP)-based demodulator hardware uses a unique personal computer test interface for downloading test data files. The demodulator test results demonstrate a near-ideal performance within 0.2 dB of theory.
Reduze - Feynman integral reduction in C++
NASA Astrophysics Data System (ADS)
Studerus, C.
2010-07-01
Reduze is a computer program for reducing Feynman integrals to master integrals employing a Laporta algorithm. The program is written in C++ and uses classes provided by the GiNaC library to perform the simplifications of the algebraic prefactors in the system of equations. Reduze offers the possibility to run reductions in parallel. Program summaryProgram title:Reduze Catalogue identifier: AEGE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGE_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions:: yes No. of lines in distributed program, including test data, etc.: 55 433 No. of bytes in distributed program, including test data, etc.: 554 866 Distribution format: tar.gz Programming language: C++ Computer: All Operating system: Unix/Linux Number of processors used: The number of processors is problem dependent. More than one possible but not arbitrary many. RAM: Depends on the complexity of the system. Classification: 4.4, 5 External routines: CLN ( http://www.ginac.de/CLN/), GiNaC ( http://www.ginac.de/) Nature of problem: Solving large systems of linear equations with Feynman integrals as unknowns and rational polynomials as prefactors. Solution method: Using a Gauss/Laporta algorithm to solve the system of equations. Restrictions: Limitations depend on the complexity of the system (number of equations, number of kinematic invariants). Running time: Depends on the complexity of the system.
An Early Quantum Computing Proposal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Stephen Russell; Alexander, Francis Joseph; Barros, Kipton Marcos
The D-Wave 2X is the third generation of quantum processing created by D-Wave. NASA (with Google and USRA) and Lockheed Martin (with USC), both own D-Wave systems. Los Alamos National Laboratory (LANL) purchased a D-Wave 2X in November 2015. The D-Wave 2X processor contains (nominally) 1152 quantum bits (or qubits) and is designed to specifically perform quantum annealing, which is a well-known method for finding a global minimum of an optimization problem. This methodology is based on direct execution of a quantum evolution in experimental quantum hardware. While this can be a powerful method for solving particular kinds of problems,more » it also means that the D-Wave 2X processor is not a general computing processor and cannot be programmed to perform a wide variety of tasks. It is a highly specialized processor, well beyond what NNSA currently thinks of as an “advanced architecture.”A D-Wave is best described as a quantum optimizer. That is, it uses quantum superposition to find the lowest energy state of a system by repeated doses of power and settling stages. The D-Wave produces multiple solutions to any suitably formulated problem, one of which is the lowest energy state solution (global minimum). Mapping problems onto the D-Wave requires defining an objective function to be minimized and then encoding that function in the Hamiltonian of the D-Wave system. The quantum annealing method is then used to find the lowest energy configuration of the Hamiltonian using the current D-Wave Two, two-level, quantum processor. This is not always an easy thing to do, and the D-Wave Two has significant limitations that restrict problem sizes that can be run and algorithmic choices that can be made. Furthermore, as more people are exploring this technology, it has become clear that it is very difficult to come up with general approaches to optimization that can both utilize the D-Wave and that can do better than highly developed algorithms on conventional computers for specific applications. These are all fundamental challenges that must be overcome for the D-Wave, or similar, quantum computing technology to be broadly applicable.« less
AM06: the Associative Memory chip for the Fast TracKer in the upgraded ATLAS detector
NASA Astrophysics Data System (ADS)
Annovi, A.; Beretta, M. M.; Calderini, G.; Crescioli, F.; Frontini, L.; Liberali, V.; Shojaii, S. R.; Stabile, A.
2017-04-01
This paper describes the AM06 chip, which is a highly parallel processor for pattern recognition in the ATLAS high energy physics experiment. The AM06 contains memory banks that store data organized in 18 bit words; a group of 8 words is called "pattern". Each AM06 chip can store up to 131 072 patterns. The AM06 is a large chip, designed in 65 nm CMOS, and it combines full-custom memory arrays, standard logic cells and serializer/deserializer IP blocks at 2 Gbit/s for input/output communication. The overall silicon area is 168 mm2 and the chip contains about 421 million transistors. The AM06 receives the detector data for each event accepted by Level-1 trigger, up to 100 kHz, and it performs a track reconstruction based on hit information from channels of the ATLAS silicon detectors. Thanks to the design of a new associative memory cell and to the layout optimization, the AM06 consumption is only about 1 fJ/bit per comparison. The AM06 has been fabricated and successfully tested with a dedicated test system.
A thin-film microprocessor with inkjet print-programmable memory
NASA Astrophysics Data System (ADS)
Myny, Kris; Smout, Steve; Rockelé, Maarten; Bhoolokam, Ajay; Ke, Tung Huei; Steudel, Soeren; Cobb, Brian; Gulati, Aashini; Rodriguez, Francisco Gonzalez; Obata, Koji; Marinkovic, Marko; Pham, Duy-Vu; Hoppe, Arne; Gelinck, Gerwin H.; Genoe, Jan; Dehaene, Wim; Heremans, Paul
2014-12-01
The Internet of Things is driving extensive efforts to develop intelligent everyday objects. This requires seamless integration of relatively simple electronics, for example through `stick-on' electronics labels. We believe the future evolution of this technology will be governed by Wright's Law, which was first proposed in 1936 and states that the cost of a product decreases with cumulative production. This implies that a generic electronic device that can be tailored for application-specific requirements during downstream integration would be a cornerstone in the development of the Internet of Things. We present an 8-bit thin-film microprocessor with a write-once, read-many (WORM) instruction generator that can be programmed after manufacture via inkjet printing. The processor combines organic p-type and soluble oxide n-type thin-film transistors in a new flavor of the familiar complementary transistor technology with the potential to be manufactured on a very thin polyimide film, enabling low-cost flexible electronics. It operates at 6.5 V and reaches clock frequencies up to 2.1 kHz. An instruction set of 16 code lines, each line providing a 9 bit instruction, is defined by means of inkjet printing of conductive silver inks.
Augmented burst-error correction for UNICON laser memory. [digital memory
NASA Technical Reports Server (NTRS)
Lim, R. S.
1974-01-01
A single-burst-error correction system is described for data stored in the UNICON laser memory. In the proposed system, a long fire code with code length n greater than 16,768 bits was used as an outer code to augment an existing inner shorter fire code for burst error corrections. The inner fire code is a (80,64) code shortened from the (630,614) code, and it is used to correct a single-burst-error on a per-word basis with burst length b less than or equal to 6. The outer code, with b less than or equal to 12, would be used to correct a single-burst-error on a per-page basis, where a page consists of 512 32-bit words. In the proposed system, the encoding and error detection processes are implemented by hardware. A minicomputer, currently used as a UNICON memory management processor, is used on a time-demanding basis for error correction. Based upon existing error statistics, this combination of an inner code and an outer code would enable the UNICON system to obtain a very low error rate in spite of flaws affecting the recorded data.
Tian, He; Chen, Hong-Yu; Ren, Tian-Ling; Li, Cheng; Xue, Qing-Tang; Mohammad, Mohammad Ali; Wu, Can; Yang, Yi; Wong, H-S Philip
2014-06-11
Laser scribing is an attractive reduced graphene oxide (rGO) growth and patterning technology because the process is low-cost, time-efficient, transfer-free, and flexible. Various laser-scribed rGO (LSG) components such as capacitors, gas sensors, and strain sensors have been demonstrated. However, obstacles remain toward practical application of the technology where all the components of a system are fabricated using laser scribing. Memory components, if developed, will substantially broaden the application space of low-cost, flexible electronic systems. For the first time, a low-cost approach to fabricate resistive random access memory (ReRAM) using laser-scribed rGO as the bottom electrode is experimentally demonstrated. The one-step laser scribing technology allows transfer-free rGO synthesis directly on flexible substrates or non-flat substrates. Using this time-efficient laser-scribing technology, the patterning of a memory-array area up to 100 cm(2) can be completed in 25 min. Without requiring the photoresist coating for lithography, the surface of patterned rGO remains as clean as its pristine state. Ag/HfOx/LSG ReRAM using laser-scribing technology is fabricated in this work. Comprehensive electrical characteristics are presented including forming-free behavior, stable switching, reasonable reliability performance and potential for 2-bit storage per memory cell. The results suggest that laser-scribing technology can potentially produce more cost-effective and time-effective rGO-based circuits and systems for practical applications.
Energy challenges in optical access and aggregation networks.
Kilper, Daniel C; Rastegarfar, Houman
2016-03-06
Scalability is a critical issue for access and aggregation networks as they must support the growth in both the size of data capacity demands and the multiplicity of access points. The number of connected devices, the Internet of Things, is growing to the tens of billions. Prevailing communication paradigms are reaching physical limitations that make continued growth problematic. Challenges are emerging in electronic and optical systems and energy increasingly plays a central role. With the spectral efficiency of optical systems approaching the Shannon limit, increasing parallelism is required to support higher capacities. For electronic systems, as the density and speed increases, the total system energy, thermal density and energy per bit are moving into regimes that become impractical to support-for example requiring single-chip processor powers above the 100 W limit common today. We examine communication network scaling and energy use from the Internet core down to the computer processor core and consider implications for optical networks. Optical switching in data centres is identified as a potential model from which scalable access and aggregation networks for the future Internet, with the application of integrated photonic devices and intelligent hybrid networking, will emerge. © 2016 The Author(s).
A configurable and low-power mixed signal SoC for portable ECG monitoring applications.
Kim, Hyejung; Kim, Sunyoung; Van Helleputte, Nick; Artes, Antonio; Konijnenburg, Mario; Huisken, Jos; Van Hoof, Chris; Yazicioglu, Refet Firat
2014-04-01
This paper describes a mixed-signal ECG System-on-Chip (SoC) that is capable of implementing configurable functionality with low-power consumption for portable ECG monitoring applications. A low-voltage and high performance analog front-end extracts 3-channel ECG signals and single channel electrode-tissue-impedance (ETI) measurement with high signal quality. This can be used to evaluate the quality of the ECG measurement and to filter motion artifacts. A custom digital signal processor consisting of 4-way SIMD processor provides the configurability and advanced functionality like motion artifact removal and R peak detection. A built-in 12-bit analog-to-digital converter (ADC) is capable of adaptive sampling achieving a compression ratio of up to 7, and loop buffer integration reduces the power consumption for on-chip memory access. The SoC is implemented in 0.18 μm CMOS process and consumes 32 μ W from a 1.2 V while heart beat detection application is running, and integrated in a wireless ECG monitoring system with Bluetooth protocol. Thanks to the ECG SoC, the overall system power consumption can be reduced significantly.
Earth sensing: from ice to the Internet of Things
NASA Astrophysics Data System (ADS)
Martinez, K.
2017-12-01
The evolution of technology has led to improvements in our ability to use sensors for earth science research. Radio communications have improved in terms of range and power use. Miniaturisation means we now use 32 bit processors with embedded memory, storage and interfaces. Sensor technology makes it simpler to integrate devices such as accelerometers, compasses, gas and biosensors. Programming languages have developed so that it has become easier to create software for these systems. This combined with the power of the processors has made research into advanced algorithms and communications feasible. The term environmental sensor networks describes these advanced systems which are designed specifically to take sensor measurements in the natural environment. Through a decade of research into sensor networks, deployed mainly in glaciers, many areas of this still emerging technology have been explored. From deploying the first subglacial sensor probes with custom electronics and protocols we learnt tuning to harsh environments and energy management. More recently installing sensor systems in the mountains of Scotland has shown that standards have allowed complete internet and web integration. This talk will discuss the technologies used in a range of recent deployments in Scotland and Iceland focussed on creating new data streams for cryospheric and climate change research.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
Digital Intermediate Frequency Receiver Module For Use In Airborne Sar Applications
Tise, Bertice L.; Dubbert, Dale F.
2005-03-08
A digital IF receiver (DRX) module directly compatible with advanced radar systems such as synthetic aperture radar (SAR) systems. The DRX can combine a 1 G-Sample/sec 8-bit ADC with high-speed digital signal processor, such as high gate-count FPGA technology or ASICs to realize a wideband IF receiver. DSP operations implemented in the DRX can include quadrature demodulation and multi-rate, variable-bandwidth IF filtering. Pulse-to-pulse (Doppler domain) filtering can also be implemented in the form of a presummer (accumulator) and an azimuth prefilter. An out of band noise source can be employed to provide a dither signal to the ADC, and later be removed by digital signal processing. Both the range and Doppler domain filtering operations can be implemented using a unique pane architecture which allows on-the-fly selection of the filter decimation factor, and hence, the filter bandwidth. The DRX module can include a standard VME-64 interface for control, status, and programming. An interface can provide phase history data to the real-time image formation processors. A third front-panel data port (FPDP) interface can send wide bandwidth, raw phase histories to a real-time phase history recorder for ground processing.
Wong, Wing Chung; Kim, Dewey; Carter, Hannah; Diekhans, Mark; Ryan, Michael C; Karchin, Rachel
2011-08-01
Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. MySQL database, source code and binaries freely available for academic/government use at http://wiki.chasmsoftware.org, Source in Python and C++. Requires 32 or 64-bit Linux system (tested on Fedora Core 8,10,11 and Ubuntu 10), 2.5*≤ Python <3.0*, MySQL server >5.0, 60 GB available hard disk space (50 MB for software and data files, 40 GB for MySQL database dump when uncompressed), 2 GB of RAM.
Retarding potential analyzer for the Pioneer-Venus Orbiter Mission
NASA Technical Reports Server (NTRS)
Knudsen, W. C.; Bakke, J.; Spenner, K.; Novak, V.
1979-01-01
The retarding potential analyzer on the Pioneer-Venus Orbiter Mission has been designed to measure most of the thermal plasma parameters within and near the Venusian ionosphere. Parameters include total ion concentration, concentrations of the more abundant ions, ion temperatures, ion drift velocity, electron temperature, and low-energy (0-50 eV) electron distribution function. To accomplish these measurements on a spinning vehicle with a small telemetry bit rate, several functions, including decision functions not previously used in RPA's, have been developed and incorporated into this instrument. The more significant functions include automatic electrometer ranging with background current compensation; digital, quadratic retarding potential step generation for the ion and low-energy electron scans; a current sampling interval of 2 ms throughout all scans; digital logic inflection point detection and data selection; and automatic ram direction detection. Extensive numerical simulation and plasma chamber tests have been conducted to verify adequacy of the design for the Pioneer Mission.
ETARA - EVENT TIME AVAILABILITY, RELIABILITY ANALYSIS
NASA Technical Reports Server (NTRS)
Viterna, L. A.
1994-01-01
The ETARA system was written to evaluate the performance of the Space Station Freedom Electrical Power System, but the methodology and software can be modified to simulate any system that can be represented by a block diagram. ETARA is an interactive, menu-driven reliability, availability, and maintainability (RAM) simulation program. Given a Reliability Block Diagram representation of a system, the program simulates the behavior of the system over a specified period of time using Monte Carlo methods to generate block failure and repair times as a function of exponential and/or Weibull distributions. ETARA can calculate availability parameters such as equivalent availability, state availability (percentage of time at a particular output state capability), continuous state duration and number of state occurrences. The program can simulate initial spares allotment and spares replenishment for a resupply cycle. The number of block failures are tabulated both individually and by block type. ETARA also records total downtime, repair time, and time waiting for spares. Maintenance man-hours per year and system reliability, with or without repair, at or above a particular output capability can also be calculated. The key to using ETARA is the development of a reliability or availability block diagram. The block diagram is a logical graphical illustration depicting the block configuration necessary for a function to be successfully accomplished. Each block can represent a component, a subsystem, or a system. The function attributed to each block is considered for modeling purposes to be either available or unavailable; there are no degraded modes of block performance. A block does not have to represent physically connected hardware in the actual system to be connected in the block diagram. The block needs only to have a role in contributing to an available system function. ETARA can model the RAM characteristics of systems represented by multilayered, nesting block diagrams. There are no restrictions on the number of total blocks or on the number of blocks in a series, parallel, or M-of-N parallel subsystem. In addition, the same block can appear in more than one subsystem if such an arrangement is necessary for an accurate model. ETARA 3.3 is written in APL2 for IBM PC series computers or compatibles running MS-DOS and the APL2 interpreter. Hardware requirements for the APL2 system include 640K of RAM, 2Mb of extended memory, and an 80386 or 80486 processor with an 80x87 math co-processor. The standard distribution medium for this package is a set of two 5.25 inch 360K MS-DOS format diskettes. A sample executable is included. The executable contains licensed material from the APL2 for the IBM PC product which is program property of IBM; Copyright IBM Corporation 1988 - All rights reserved. It is distributed with IBM's permission. The contents of the diskettes are compressed using the PKWARE archiving tools. The utility to unarchive the files, PKUNZIP.EXE, is included. ETARA was developed in 1990 and last updated in 1991.
LabPatch, an acquisition and analysis program for patch-clamp electrophysiology.
Robinson, T; Thomsen, L; Huizinga, J D
2000-05-01
An acquisition and analysis program, "LabPatch," has been developed for use in patch-clamp research. LabPatch controls any patch-clamp amplifier, acquires and records data, runs voltage protocols, plots and analyzes data, and connects to spreadsheet and database programs. Controls within LabPatch are grouped by function on one screen, much like an oscilloscope front panel. The software is mouse driven, so that the user need only point and click. Finally, the ability to copy data to other programs running in Windows 95/98, and the ability to keep track of experiments using a database, make LabPatch extremely versatile. The system requirements include Windows 95/98, at least a 100-MHz processor and 16 MB RAM, a data acquisition card, digital-to-analog converter, and a patch-clamp amplifier. LabPatch is available free of charge at http://www.fhs.mcmaster.ca/huizinga/.
System for loading executable code into volatile memory in a downhole tool
Hall, David R.; Bartholomew, David B.; Johnson, Monte L.
2007-09-25
A system for loading an executable code into volatile memory in a downhole tool string component comprises a surface control unit comprising executable code. An integrated downhole network comprises data transmission elements in communication with the surface control unit and the volatile memory. The executable code, stored in the surface control unit, is not permanently stored in the downhole tool string component. In a preferred embodiment of the present invention, the downhole tool string component comprises boot memory. In another embodiment, the executable code is an operating system executable code. Preferably, the volatile memory comprises random access memory (RAM). A method for loading executable code to volatile memory in a downhole tool string component comprises sending the code from the surface control unit to a processor in the downhole tool string component over the network. A central processing unit writes the executable code in the volatile memory.
Design of a ``Digital Atlas Vme Electronics'' (DAVE) module
NASA Astrophysics Data System (ADS)
Goodrick, M.; Robinson, D.; Shaw, R.; Postranecky, M.; Warren, M.
2012-01-01
ATLAS-SCT has developed a new ATLAS trigger card, 'Digital Atlas Vme Electronics' (``DAVE''). The unit is designed to provide a versatile array of interface and logic resources, including a large FPGA. It interfaces to both VME bus and USB hosts. DAVE aims to provide exact ATLAS CTP (ATLAS Central Trigger Processor) functionality, with random trigger, simple and complex deadtime, ECR (Event Counter Reset), BCR (Bunch Counter Reset) etc. being generated to give exactly the same conditions in standalone running as experienced in combined runs. DAVE provides additional hardware and a large amount of free firmware resource to allow users to add or change functionality. The combination of the large number of individually programmable inputs and outputs in various formats, with very large external RAM and other components all connected to the FPGA, also makes DAVE a powerful and versatile FPGA utility card.
Research and Applications Modules (RAM), phase B study
NASA Technical Reports Server (NTRS)
1972-01-01
The research and applications modules (RAM) system is discussed. The RAM is a family of payload carrier modules that can be delivered to and retrieved from earth orbit by the space shuttle. The RAM's capability for implementing a wide range of manned and man-tended missions is described. The rams have evolved into three types; (1) pressurized RAMs, (2) unpressurized RAMs, and (3) pressurizable free-flying RAMs. A reference experiment plan for use as a baseline in the derivation and planning of the RAM project is reported. The plan describes the number and frequency of shuttle flights dedicated to RAM missions and the RAM payloads for the identified flights.
Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture
Guzmán, Pablo; Díaz, Javier; Agís, Rodrigo; Ros, Eduardo
2010-01-01
The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains. PMID:22319283
Saeedi, Ehsan; Kong, Yinan
2017-01-01
In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM), which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA) architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST). The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance (1Area×Time=1AT) and Area × Time × Energy (ATE) product of the proposed design are far better than the most significant studies found in the literature. PMID:28459831
Hossain, Md Selim; Saeedi, Ehsan; Kong, Yinan
2017-01-01
In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM), which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA) architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST). The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance ([Formula: see text]) and Area × Time × Energy (ATE) product of the proposed design are far better than the most significant studies found in the literature.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.
Scharfe, Michael; Pielot, Rainer; Schreiber, Falk
2010-01-11
Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Parallel design patterns for a low-power, software-defined compressed video encoder
NASA Astrophysics Data System (ADS)
Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar
2011-06-01
Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
A Versatile Multichannel Digital Signal Processing Module for Microcalorimeter Arrays
NASA Astrophysics Data System (ADS)
Tan, H.; Collins, J. W.; Walby, M.; Hennig, W.; Warburton, W. K.; Grudberg, P.
2012-06-01
Different techniques have been developed for reading out microcalorimeter sensor arrays: individual outputs for small arrays, and time-division or frequency-division or code-division multiplexing for large arrays. Typically, raw waveform data are first read out from the arrays using one of these techniques and then stored on computer hard drives for offline optimum filtering, leading not only to requirements for large storage space but also limitations on achievable count rate. Thus, a read-out module that is capable of processing microcalorimeter signals in real time will be highly desirable. We have developed multichannel digital signal processing electronics that are capable of on-board, real time processing of microcalorimeter sensor signals from multiplexed or individual pixel arrays. It is a 3U PXI module consisting of a standardized core processor board and a set of daughter boards. Each daughter board is designed to interface a specific type of microcalorimeter array to the core processor. The combination of the standardized core plus this set of easily designed and modified daughter boards results in a versatile data acquisition module that not only can easily expand to future detector systems, but is also low cost. In this paper, we first present the core processor/daughter board architecture, and then report the performance of an 8-channel daughter board, which digitizes individual pixel outputs at 1 MSPS with 16-bit precision. We will also introduce a time-division multiplexing type daughter board, which takes in time-division multiplexing signals through fiber-optic cables and then processes the digital signals to generate energy spectra in real time.
NASA Astrophysics Data System (ADS)
Daniluk, Andrzej
2011-06-01
A computational model is a computer program, which attempts to simulate an abstract model of a particular system. Computational models use enormous calculations and often require supercomputer speed. As personal computers are becoming more and more powerful, more laboratory experiments can be converted into computer models that can be interactively examined by scientists and students without the risk and cost of the actual experiments. The future of programming is concurrent programming. The threaded programming model provides application programmers with a useful abstraction of concurrent execution of multiple tasks. The objective of this release is to address the design of architecture for scientific application, which may execute as multiple threads execution, as well as implementations of the related shared data structures. New version program summaryProgram title: GrowthCP Catalogue identifier: ADVL_v4_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v4_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 32 269 No. of bytes in distributed program, including test data, etc.: 8 234 229 Distribution format: tar.gz Programming language: Free Object Pascal Computer: multi-core x64-based PC Operating system: Windows XP, Vista, 7 Has the code been vectorised or parallelized?: No RAM: More than 1 GB. The program requires a 32-bit or 64-bit processor to run the generated code. Memory is addressed using 32-bit (on 32-bit processors) or 64-bit (on 64-bit processors with 64-bit addressing) pointers. The amount of addressed memory is limited only by the available amount of virtual memory. Supplementary material: The figures mentioned in the "Summary of revisions" section can be obtained here. Classification: 4.3, 7.2, 6.2, 8, 14 External routines: Lazarus [1] Catalogue identifier of previous version: ADVL_v3_0 Journal reference of previous version: Comput. Phys. Comm. 181 (2010) 709 Does the new version supersede the previous version?: Yes Nature of problem: Reflection high-energy electron diffraction (RHEED) is an important in-situ analysis technique, which is capable of giving quantitative information about the growth process of thin layers and its control. It can be used to calibrate growth rate, analyze surface morphology, calibrate surface temperature, monitor the arrangement of the surface atoms, and provide information about growth kinetics. Such control allows the development of structures where the electrons can be confined in space, giving quantum wells or even quantum dots. In order to determine the atomic positions of atoms in the first few layers, the RHEED intensity must be measured as a function of the scattering angles and then compared with dynamic calculations. The objective of this release is to address the design of architecture for application that simulates the rocking curves RHEED intensities during hetero-epitaxial growth process of thin films. Solution method: The GrowthCP is a complex numerical model that uses multiple threads for simulation of epitaxial growth of thin layers. This model consists of two transactional parts. The first part is a mathematical model being based on the Runge-Kutta method with adaptive step-size control. The second part represents first-principles of the one-dimensional RHEED computational model. This model is based on solving a one-dimensional Schrödinger equation. Several problems can arise when applications contain a mixture of data access code, numerical code, and presentation code. Such applications are difficult to maintain, because interdependencies between all the components cause strong ripple effects whenever a change is made anywhere. Adding new data views often requires reimplementing a numerical code, which then requires maintenance in multiple places. In order to solve problems of this type, the computational and threading layers of the project have been implemented in the form of one design pattern as a part of Model-View-Controller architecture. Reasons for new version: Responding to the users' feedback the Growth09 project has been upgraded to a standard that allows the carrying out of sample computations of the RHEED intensities for a disordered surface for a wide range of single- and epitaxial hetero-structures. The design pattern on which the project is based has also been improved. It is shown that this model can be effectively used for multithreaded growth simulations of thin epitaxial layers and corresponding RHEED intensities for a wide range of single- and hetero-structures. Responding to the users' feedback the present release has been implemented using a well-documented free compiler [1] not requiring the special configuration and installation additional libraries. Summary of revisions: The logical structure of the Growth09 program has been modified according to the scheme showed in Fig. 1. The class diagram in Fig. 1 is a static view of the main platform-specific elements of the GrowthCP architecture. Fig. 2 provides a dynamic view by showing the creation and destruction simplistic sequence diagram for the process. The program requires the user to provide the appropriate parameters in the form of a knowledge base for the crystal structures under investigation. These parameters are loaded from the parameters. ini files at run-time. Instructions to prepare the .ini files can be found in the new distribution. The program enables carrying out different growth models and one-dimensional dynamical RHEED calculations for the fcc lattice with basis of three-atoms, fcc lattice with basis of two-atoms, fcc lattice with single atom basis, Zinc-Blende, Sodium Chloride, and Wurtzite crystalline structures and hetero-structures, but yet the Fourier component of the scattering potential in the TRHEEDCalculations.crystPotUgXXX() procedure can be modified and implemented according to users' specific application requirements. The Fourier component of the scattering potential of the whole crystalline hetero-structures can be determined as a sum of contributions coming from all thin slices of individual atomic layers. To carry out one-dimensional calculations of the scattering potentials, the program uses properly constructed self-consistent procedures. Each component of the system shown in Figs. 1 and 2 is fully extendable and can easily be adapted to new changeable requirements. Two essential logical elements of the system, i.e. TGrowthTransaction and TRHEEDCalculations classes, were designed and implemented in this way for them to pass the information to themselves without the need to use the data-exchange files given. In consequence each of them can be independently modified and/or extended. Implementing other types of differential equations and the different algorithm for solving them in the TGrowthTransaction class does not require another implementation of the TRHEEDCalculations class. Similarly, implementing other forms of scattering potential and different algorithm for RHEED calculation stays without the influence on the TGrowthTransaction class construction. Unusual features: The program is distributed in the form of main project GrowthCP.lpr, with associated files, and should be compiled using Lazarus IDE. The program should be compiled with English/USA regional and language options. Running time: The typical running time is machine and user-parameters dependent. References: http://sourceforge.net/projects/lazarus/files/.
The advanced receiver 2: Telemetry test results in CTA 21
NASA Technical Reports Server (NTRS)
Hinedi, S.; Bevan, R.; Marina, M.
1991-01-01
Telemetry tests with the Advanced Receiver II (ARX II) in Compatibility Test Area 21 are described. The ARX II was operated in parallel with a Block-III Receiver/baseband processor assembly combination (BLK-III/BPA) and a Block III Receiver/subcarrier demodulation assembly/symbol synchronization assembly combination (BLK-III/SDA/SSA). The telemetry simulator assembly provided the test signal for all three configurations, and the symbol signal to noise ratio as well as the symbol error rates were measured and compared. Furthermore, bit error rates were also measured by the system performance test computer for all three systems. Results indicate that the ARX-II telemetry performance is comparable and sometimes superior to the BLK-III/BPA and BLK-III/SDA/SSA combinations.
Two dimensional recursive digital filters for near real time image processing
NASA Technical Reports Server (NTRS)
Olson, D.; Sherrod, E.
1980-01-01
A program was designed toward the demonstration of the feasibility of using two dimensional recursive digital filters for subjective image processing applications that require rapid turn around. The concept of the use of a dedicated minicomputer for the processor for this application was demonstrated. The minicomputer used was the HP1000 series E with a RTE 2 disc operating system and 32K words of memory. A Grinnel 256 x 512 x 8 bit display system was used to display the images. Sample images were provided by NASA Goddard on a 800 BPI, 9 track tape. Four 512 x 512 images representing 4 spectral regions of the same scene were provided. These images were filtered with enhancement filters developed during this effort.
NASA Technical Reports Server (NTRS)
Hepner, T. E.; Meyers, J. F. (Inventor)
1985-01-01
A laser velocimeter covariance processor which calculates the auto covariance and cross covariance functions for a turbulent flow field based on Poisson sampled measurements in time from a laser velocimeter is described. The device will process a block of data that is up to 4096 data points in length and return a 512 point covariance function with 48-bit resolution along with a 512 point histogram of the interarrival times which is used to normalize the covariance function. The device is designed to interface and be controlled by a minicomputer from which the data is received and the results returned. A typical 4096 point computation takes approximately 1.5 seconds to receive the data, compute the covariance function, and return the results to the computer.
Investigation of Modularly Configured Attached Processors with Intelligent Memories
1994-09-30
sequence of computations becomes cl = c11 + C[i, i] := Cli, j] + A[i, k) * Blk, j]; al x bti, c12 - c12 + k X bkX2, C13 = C1 3 + alk X b 3 , ... , c =n...Cin + alk X bkn, C2 1 = Cat + a2k x bkl , The sequence of computations becomes cii = a I x bli, C22 = cU2 + a2k x b2, ... Cj a2 1x bij, ca3j = aa x...1 to 1/2 Least significant 1/2 address bits Fig. 1. Block diagram of a memory module PREGVJKE is v I pis P16 16 ItO16 = 4-81T OECD - = AMKSS A s-Byrf
Speech coding at 4800 bps for mobile satellite communications
NASA Technical Reports Server (NTRS)
Gersho, Allen; Chan, Wai-Yip; Davidson, Grant; Chen, Juin-Hwey; Yong, Mei
1988-01-01
A speech compression project has recently been completed to develop a speech coding algorithm suitable for operation in a mobile satellite environment aimed at providing telephone quality natural speech at 4.8 kbps. The work has resulted in two alternative techniques which achieve reasonably good communications quality at 4.8 kbps while tolerating vehicle noise and rather severe channel impairments. The algorithms are embodied in a compact self-contained prototype consisting of two AT and T 32-bit floating-point DSP32 digital signal processors (DSP). A Motorola 68HC11 microcomputer chip serves as the board controller and interface handler. On a wirewrapped card, the prototype's circuit footprint amounts to only 200 sq cm, and consumes about 9 watts of power.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate ( C'' or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability bymore » using DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate (``C`` or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability by usingmore » DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
A Study of Communication Processor Systems
1979-12-01
STUDYSCOFONLA (A . TIS R I8TIO S" - EEG R AN T-i R $a t 17. DITR UTO STTEeN in h b’*l.elldi 1Ok 0 ~45ft i9~a R,1OI)7 -C S. APeMr’wl NOTES Uh a DC gt~ & C...MM Fig. 3.3 Pluribus syste m II 24 -4~ I tm ap B u s- ----I FP K K Cmi Cornrnjer MeodUlO Fig. 3.4 Carnegie "iellLfn CmI* 25 0I M r K K P M K KP M K...144 bits incurs a mean transmission delay (o ) of 0.29 milli- seconds. The maximum transmission delays TDF and TDC can be obtained as 15 milliseconds
Remote Asynchronous Message Service Gateway
NASA Technical Reports Server (NTRS)
Wang, Shin-Ywan; Burleigh, Scott C.
2011-01-01
The Remote Asynchronous Message Service (RAMS) gateway is a special-purpose AMS application node that enables exchange of AMS messages between nodes residing in different AMS "continua," notionally in different geographical locations. JPL s implementation of RAMS gateway functionality is integrated with the ION (Interplanetary Overlay Network) implementation of the DTN (Delay-Tolerant Networking) bundle protocol, and with JPL s implementation of AMS itself. RAMS protocol data units are encapsulated in ION bundles and are forwarded to the neighboring RAMS gateways identified in the source gateway s AMS management information base. Each RAMS gateway has interfaces in two communication environments: the AMS message space it serves, and the RAMS network - the grid or tree of mutually aware RAMS gateways - that enables AMS messages produced in one message space to be forwarded to other message spaces of the same venture. Each gateway opens persistent, private RAMS network communication channels to the RAMS gateways of other message spaces for the same venture, in other continua. The interconnected RAMS gateways use these communication channels to forward message petition assertions and cancellations among themselves. Each RAMS gateway subscribes locally to all subjects that are of interest in any of the linked message spaces. On receiving its copy of a message on any of these subjects, the RAMS gateway node uses the RAMS network to forward the message to every other RAMS gateway whose message space contains at least one node that has subscribed to messages on that subject. On receiving a message via the RAMS network from some other RAMS gateway, the RAMS gateway node forwards the message to all subscribers in its own message space.
Veleba, Mark; De Majumdar, Shyamasree; Hornsey, Michael; Woodford, Neil; Schneiders, Thamarai
2013-05-01
The intrinsically encoded ramA gene has been linked to tigecycline resistance through the up-regulation of efflux pump AcrAB in Enterobacter cloacae. The molecular basis for increased ramA expression in E. cloacae and Enterobacter aerogenes, as well as the role of AraC regulator rarA, has not yet been shown. To ascertain the intrinsic molecular mechanism(s) involved in tigecycline resistance in Enterobacter spp., we analysed the expression levels of ramA and rarA and corresponding efflux pump genes acrAB and oqxAB in Enterobacter spp. clinical isolates. The expression levels of ramA, rarA, oqxA and acrA were tested by quantitative real-time RT-PCR. The ramR open reading frames of the ramA-overexpressing strains were sequenced; strains harbouring mutations were transformed with wild-type ramR to study altered ramA expression and tigecycline susceptibility. Tigecycline resistance was mediated primarily by increased ramA expression in E. cloacae and E. aerogenes. Only the ramA-overexpressing E. cloacae isolates showed increased rarA and oqxA expression. Upon complementation with wild-type ramR, all Enterobacter spp. containing ramR mutations exhibited decreased ramA and acrA expression and increased tigecycline susceptibility. Exceptions were one E. cloacae strain and one E. aerogenes strain, where a decrease in ramA levels was not accompanied by lower acrA expression. Increased ramA expression due to ramR deregulation is the primary mediator of tigecycline resistance in clinical isolates of E. cloacae and E. aerogenes. However, some ramA-overexpressing isolates do not show changes in ramR, suggesting alternate pathways of ramA regulation; the rarA regulator and the oqxAB efflux pump may also play a role in tigecycline resistance in E. cloacae.
Perkins, A; Fitzgerald, J A
1994-01-01
The objective of this study was to test whether the sexual behavior of the ram affects the ram effect. Rams exhibiting either high (HP) or low (LP) levels of sexual performance (on the basis of serving capacity tests) were exposed to 89 anestrous ewes for 28 d. Thirty-two anestrous ewes were not exposed to rams. The objective of this study was to compare the efficacy of estrus induction by HP (n = 4) vs LP (n = 4) rams. Plasma progesterone concentration was used as an index of ovarian activity. Groups of ewes were exposed to either an HP or an LP ram in a .32-ha pasture. Courtship behaviors of rams were recorded for 6 h on the initial day of exposure and for 30-min periods on alternate days thereafter. A greater percentage of ewes exposed to HP rams ovulated (95%) compared with ewes exposed to LP rams (78%) (P < .02). On the 1st d of exposure, the HP rams exhibited more courtship behavior and spent more time near the ewes (P < .04). The HP rams spent more time within 1 m of ewes during the 28-d exposure. There were no differences in the amount of contact with rams (LP or HP) between rise in progesterone indicate of ovulation tended to occur earlier (P = .06) in ewes penned with HP rams. A greater percentage of ewes exposed to LP rams (P = .03) had early elevations of progesterone with no concurrent sexual behavior. These data imply that in addition to a pheromone the sexual behavior of the ram may be important in initiating ovarian cycle activity.
Monitoring of computing resource use of active software releases at ATLAS
NASA Astrophysics Data System (ADS)
Limosani, Antonio; ATLAS Collaboration
2017-10-01
The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed in plots generated using Python visualization libraries and collected into pre-formatted auto-generated Web pages, which allow the ATLAS developer community to track the performance of their algorithms. This information is however preferentially filtered to domain leaders and developers through the use of JIRA and via reports given at ATLAS software meetings. Finally, we take a glimpse of the future by reporting on the expected CPU and RAM usage in benchmark workflows associated with the High Luminosity LHC and anticipate the ways performance monitoring will evolve to understand and benchmark future workflows.
Perelman, Yevgeny; Ginosar, Ran
2007-01-01
A mixed-signal front-end processor for multichannel neuronal recording is described. It receives 12 differential-input channels of implanted recording electrodes. A programmable cutoff High Pass Filter (HPF) blocks dc and low-frequency input drift at about 1 Hz. The signals are band-split at about 200 Hz to low-frequency Local Field Potential (LFP) and high-frequency spike data (SPK), which is band limited by a programmable-cutoff LPF, in a range of 8-13 kHz. Amplifier offsets are compensated by 5-bit calibration digital-to-analog converters (DACs). The SPK and LFP channels provide variable amplification rates of up to 5000 and 500, respectively. The analog signals are converted into 10-bit digital form, and streamed out over a serial digital bus at up to 8 Mbps. A threshold filter suppresses inactive portions of the signal and emits only spike segments of programmable length. A prototype has been fabricated on a 0.35-microm CMOS process and tested successfully, demonstrating a 3-microV noise level. Special interface system incorporating an embedded CPU core in a programmable logic device accompanied by real-time software has been developed to allow connectivity to a computer host.
A reconfigurable cryogenic platform for the classical control of quantum processors
NASA Astrophysics Data System (ADS)
Homulle, Harald; Visser, Stefan; Patra, Bishnu; Ferrari, Giorgio; Prati, Enrico; Sebastiano, Fabio; Charbon, Edoardo
2017-04-01
The implementation of a classical control infrastructure for large-scale quantum computers is challenging due to the need for integration and processing time, which is constrained by coherence time. We propose a cryogenic reconfigurable platform as the heart of the control infrastructure implementing the digital error-correction control loop. The platform is implemented on a field-programmable gate array (FPGA) that supports the functionality required by several qubit technologies and that can operate close to the physical qubits over a temperature range from 4 K to 300 K. This work focuses on the extensive characterization of the electronic platform over this temperature range. All major FPGA building blocks (such as look-up tables (LUTs), carry chains (CARRY4), mixed-mode clock manager (MMCM), phase-locked loop (PLL), block random access memory, and IDELAY2 (programmable delay element)) operate correctly and the logic speed is very stable. The logic speed of LUTs and CARRY4 changes less then 5%, whereas the jitter of MMCM and PLL clock managers is reduced by 20%. The stability is finally demonstrated by operating an integrated 1.2 GSa/s analog-to-digital converter (ADC) with a relatively stable performance over temperature. The ADCs effective number of bits drops from 6 to 4.5 bits when operating at 15 K.
A reconfigurable cryogenic platform for the classical control of quantum processors.
Homulle, Harald; Visser, Stefan; Patra, Bishnu; Ferrari, Giorgio; Prati, Enrico; Sebastiano, Fabio; Charbon, Edoardo
2017-04-01
The implementation of a classical control infrastructure for large-scale quantum computers is challenging due to the need for integration and processing time, which is constrained by coherence time. We propose a cryogenic reconfigurable platform as the heart of the control infrastructure implementing the digital error-correction control loop. The platform is implemented on a field-programmable gate array (FPGA) that supports the functionality required by several qubit technologies and that can operate close to the physical qubits over a temperature range from 4 K to 300 K. This work focuses on the extensive characterization of the electronic platform over this temperature range. All major FPGA building blocks (such as look-up tables (LUTs), carry chains (CARRY4), mixed-mode clock manager (MMCM), phase-locked loop (PLL), block random access memory, and IDELAY2 (programmable delay element)) operate correctly and the logic speed is very stable. The logic speed of LUTs and CARRY4 changes less then 5%, whereas the jitter of MMCM and PLL clock managers is reduced by 20%. The stability is finally demonstrated by operating an integrated 1.2 GSa/s analog-to-digital converter (ADC) with a relatively stable performance over temperature. The ADCs effective number of bits drops from 6 to 4.5 bits when operating at 15 K.
NASA Astrophysics Data System (ADS)
Bykova, L. E.; Razdymakhina, O. N.
2011-07-01
In this paper the results of investigations of near-Earth asteroids (NEA) dynamics in vicinity of the 1/2 resonance with Earth are presented. For each of these asteroids the evolution of probability motion domains is investigated during several thousand years. The investigations are conducted on cluster SKIF Cyberia with digit grid 128 bit. The performance of motion prediction of many real and virtual asteroids on multiple-processor computing system with long digit grid has been shown. The estimate of performance is carried out in comparison with solution on personal computer with digit grid 80 bit
A high-rate PCI-based telemetry processor system
NASA Astrophysics Data System (ADS)
Turri, R.
2002-07-01
The high performances reached by the Satellite on-board telemetry generation and transmission, as consequently, will impose the design of ground facilities with higher processing capabilities at low cost to allow a good diffusion of these ground station. The equipment normally used are based on complex, proprietary bus and computing architectures that prevent the systems from exploiting the continuous and rapid increasing in computing power available on market. The PCI bus systems now allow processing of high-rate data streams in a standard PC-system. At the same time the Windows NT operating system supports multitasking and symmetric multiprocessing, giving the capability to process high data rate signals. In addition, high-speed networking, 64 bit PCI-bus technologies and the increase in processor power and software, allow creating a system based on COTS products (which in future may be easily and inexpensively upgraded). In the frame of EUCLID RTP 9.8 project, a specific work element was dedicated to develop the architecture of a system able to acquire telemetry data of up to 600 Mbps. Laben S.p.A - a Finmeccanica Company -, entrusted of this work, has designed a PCI-based telemetry system making possible the communication between a satellite down-link and a wide area network at the required rate.
MoNET: media over net gateway processor for next-generation network
NASA Astrophysics Data System (ADS)
Elabd, Hammam; Sundar, Rangarajan; Dedes, John
2001-12-01
MoNETTM (Media over Net) SX000 product family is designed using a scalable voice, video and packet-processing platform to address applications with channel densities from few voice channels to four OC3 per card. This platform is developed for bridging public circuit-switched network to the next generation packet telephony and data network. The platform consists of a DSP farm, RISC processors and interface modules. DSP farm is required to execute voice compression, image compression and line echo cancellation algorithms for large number of voice, video, fax, and modem or data channels. RISC CPUs are used for performing various packetizations based on RTP, UDP/IP and ATM encapsulations. In addition, RISC CPUs also participate in the DSP farm load management and communication with the host and other MoP devices. The MoNETTM S1000 communications device is designed for voice processing and for bridging TDM to ATM and IP packet networks. The S1000 consists of the DSP farm based on Carmel DSP core and 32-bit RISC CPU, along with Ethernet, Utopia, PCI, and TDM interfaces. In this paper, we will describe the VoIP infrastructure, building blocks of the S500, S1000 and S3000 devices, algorithms executed on these device and associated channel densities, detailed DSP architecture, memory architecture, data flow and scheduling.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Ram locus is a key regulator to trigger multidrug resistance in Enterobacter aerogenes.
Molitor, Alexander; James, Chloë E; Fanning, Séamus; Pagès, Jean-Marie; Davin-Regli, Anne
2018-02-01
Several genetic regulators belonging to AraC family are involved in the emergence of MDR isolates of E. aerogenes due to alterations in membrane permeability. Compared with the genetic regulator Mar, RamA may be more relevant towards the emergence of antibiotic resistance. Focusing on the global regulators, Mar and Ram, we compared the amino acid sequences of the Ram repressor in 59 clinical isolates and laboratory strains of E. aerogenes. Sequence types were associated with their corresponding multi-drug resistance phenotypes and membrane protein expression profiles using MIC and immunoblot assays. Quantitative gene expression analysis of the different regulators and their targets (porins and efflux pump components) were performed. In the majority of the MDR isolates tested, ramR and a region upstream of ramA were mutated but marR or marA were unchanged. Expression and cloning experiments highlighted the involvement of the ram locus in the modification of membrane permeability. Overexpression of RamA lead to decreased porin production and increased expression of efflux pump components, whereas overexpression of RamR had the opposite effects. Mutations or deletions in ramR, leading to the overexpression of RamA predominated in clinical MDR E. aerogenes isolates and were associated with a higher-level of expression of efflux pump components. It was hypothesised that mutations in ramR, and the self-regulating region proximal to ramA, probably altered the binding properties of the RamR repressor; thereby producing the MDR phenotype. Consequently, mutability of RamR may play a key role in predisposing E. aerogenes towards the emergence of a MDR phenotype.
Optical signal processing for a smart vehicle lighting system using a-SiCH technology
NASA Astrophysics Data System (ADS)
Vieira, M. A.; Vieira, M.; Vieira, P.; Louro, P.
2017-05-01
We propose the use of Visible Light Communication (VLC) for vehicle safety applications, creating a smart vehicle lighting system that combines the functions of illumination and signaling, communications, and positioning. The feasibility of VLC is demonstrated by employing trichromatic Red-Green-Blue (RGB) LEDs as transmitters, since they offer the possibility of Wavelength Division Multiplexing (WDM), which can greatly increase the transmission data rate, when using SiC double p-i-n receivers to encode/decode the information. Trichromatic RGB Light Emitting Diodes (LED)s (RGB-LED) are used together for illumination proposes (headlamps) and individually, each chip, to transmit the driving range distance and data information. An on-off code is used to transmit the data. Free space is the transmission medium. The receivers consist of two stacked amorphous a-H:SiC cells. They combine the simultaneous demultiplexing operation with the photodetection and self-amplification. The proposed coding is based on SiC technology. Multiple Input Multi Output (MIMO) architecture is used. For data transmission, we propose the use of two headlights based on commercially available modulated white RGB-LEDs. For data receiving and decoding we use three a-SiC:H double pin/pin optical processors symmetrically distributed at the vehicle tail Moreover, we present a way to achieve vehicular communication using the parity bits. A representation with a 4 bit original string color message and the transmitted 7 bit string, the encoding and decoding accurate positional information processes and the design of SiC navigation system are discussed and tested. A visible multilateration method estimates the drive distance range by using the decoded information received from several non-collinear transmitters.
Mozo, René; Galeote, Ana Isabel; Alabart, José Luis; Fantova, Enrique; Folch, José
2015-11-26
Predicting the ability of rams to detect, mate and fertilise ewes in oestrus accurately is certainly difficult; however, tests based on clinical examinations have been performed to assess the overall potential capacity of rams to serve and impregnate ewes. Clinical examinations for breeding soundness evaluation were carried out in 897 Rasa Aragonesa (RA) rams from 35 flocks in North-Eastern (NE) Spain. Clinical examinations of head, trunk, limbs and genitals were performed in each ram. Blood samples were collected for a serological study of Brucella ovis. The sheep owners were surveyed regarding the characteristics of the flock, rams' health history and the management of rams. The clinical alterations found were classified according to severity (mild or severe). Rams were classified as suitable (without lesions or with only mild lesions) or unsuitable (with severe lesions) for breeding depending on the results of the clinical examinations. The results showed that 60.6 % of rams presented some type of alteration (mild: 43.3 %; severe: 17.3 %) in various body parts (genitalia: 31.6 %; head and trunk: 37.2 %; limbs: 15.5 %), and that 16.7 % of rams were classified as unsuitable breeders. The most common genital alterations were ulcerative posthitis (18.7 %) followed by testicular lesions (5.3 %). The highest prevalence of unsuitable breeders was found in the category of adult and aged rams (13.8 % and 37.4 %, respectively) and in the category of emaciated rams (33.3 %). All rams examined were seronegative to Brucella ovis. The mean percentage of rams in flocks was 2.8 % (min: 1.6 %; max: 4.6 %); nevertheless, this percentage dropped to 2.5 % (min: 1.4 %; max: 3.7 %) and 2.1 % (min: 0.3 %; max: 3.5 %) when only suitable or effective (suitable mature) rams were considered. Thus, it is concluded that there are fewer effective rams in farms than farmers realise. Frequent clinical examination of males is recommended in order to identify potentially infertile rams.
Research on NC motion controller based on SOPC technology
NASA Astrophysics Data System (ADS)
Jiang, Tingbiao; Meng, Biao
2006-11-01
With the rapid development of the digitization and informationization, the application of numerical control technology in the manufacturing industry becomes more and more important. However, the conventional numerical control system usually has some shortcomings such as the poor in system openness, character of real-time, cutability and reconfiguration. In order to solve these problems, this paper investigates the development prospect and advantage of the application in numerical control area with system-on-a-Programmable-Chip (SOPC) technology, and puts forward to a research program approach to the NC controller based on SOPC technology. Utilizing the characteristic of SOPC technology, we integrate high density logic device FPGA, memory SRAM, and embedded processor ARM into a single programmable logic device. We also combine the 32-bit RISC processor with high computing capability of the complicated algorithm with the FPGA device with strong motivable reconfiguration logic control ability. With these steps, we can greatly resolve the defect described in above existing numerical control systems. For the concrete implementation method, we use FPGA chip embedded with ARM hard nuclear processor to construct the control core of the motion controller. We also design the peripheral circuit of the controller according to the requirements of actual control functions, transplant real-time operating system into ARM, design the driver of the peripheral assisted chip, develop the application program to control and configuration of FPGA, design IP core of logic algorithm for various NC motion control to configured it into FPGA. The whole control system uses the concept of modular and structured design to develop hardware and software system. Thus the NC motion controller with the advantage of easily tailoring, highly opening, reconfigurable, and expandable can be implemented.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets
2010-01-01
Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.
2016-12-01
The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Design and qualification of the SEU/TD Radiation Monitor chip
NASA Technical Reports Server (NTRS)
Buehler, Martin G.; Blaes, Brent R.; Soli, George A.; Zamani, Nasser; Hicks, Kenneth A.
1992-01-01
This report describes the design, fabrication, and testing of the Single-Event Upset/Total Dose (SEU/TD) Radiation Monitor chip. The Radiation Monitor is scheduled to fly on the Mid-Course Space Experiment Satellite (MSX). The Radiation Monitor chip consists of a custom-designed 4-bit SRAM for heavy ion detection and three MOSFET's for monitoring total dose. In addition the Radiation Monitor chip was tested along with three diagnostic chips: the processor monitor and the reliability and fault chips. These chips revealed the quality of the CMOS fabrication process. The SEU/TD Radiation Monitor chip had an initial functional yield of 94.6 percent. Forty-three (43) SEU SRAM's and 14 Total Dose MOSFET's passed the hermeticity and final electrical tests and were delivered to LL.
NASA Technical Reports Server (NTRS)
Glover, Richard D.
1987-01-01
A pipelined, multiprocessor, general-purpose ground support equipment for digital flight systems has been developed and placed in service at the NASA Ames Research Center's Dryden Flight Research Facility. The design is an outgrowth of the earlier aircraft interrogation and display system (AIDS) used in support of several research projects to provide engineering-units display of internal control system parameters during development and qualification testing activities. The new system, incorporating multiple 16-bit processors, is called extended AIDS (XAIDS) and is now supporting the X-29A forward-swept-wing aircraft project. This report describes the design and mechanization of XAIDS and shows the steps whereby a typical user may take advantage of its high throughput and flexible features.
a Linux PC Cluster for Lattice QCD with Exact Chiral Symmetry
NASA Astrophysics Data System (ADS)
Chiu, Ting-Wai; Hsieh, Tung-Han; Huang, Chao-Hsi; Huang, Tsung-Ren
A computational system for lattice QCD with overlap Dirac quarks is described. The platform is a home-made Linux PC cluster, built with off-the-shelf components. At present the system constitutes of 64 nodes, with each node consisting of one Pentium 4 processor (1.6/2.0/2.5 GHz), one Gbyte of PC800/1066 RDRAM, one 40/80/120 Gbyte hard disk, and a network card. The computationally intensive parts of our program are written in SSE2 codes. The speed of our system is estimated to be 70 Gflops, and its price/performance ratio is better than $1.0/Mflops for 64-bit (double precision) computations in quenched QCD. We discuss how to optimize its hardware and software for computing propagators of overlap Dirac quarks.
Development of land based radar polarimeter processor system
NASA Technical Reports Server (NTRS)
Kronke, C. W.; Blanchard, A. J.
1983-01-01
The processing subsystem of a land based radar polarimeter was designed and constructed. This subsystem is labeled the remote data acquisition and distribution system (RDADS). The radar polarimeter, an experimental remote sensor, incorporates the RDADS to control all operations of the sensor. The RDADS uses industrial standard components including an 8-bit microprocessor based single board computer, analog input/output boards, a dynamic random access memory board, and power supplis. A high-speed digital electronics board was specially designed and constructed to control range-gating for the radar. A complete system of software programs was developed to operate the RDADS. The software uses a powerful real time, multi-tasking, executive package as an operating system. The hardware and software used in the RDADS are detailed. Future system improvements are recommended.
NASA Astrophysics Data System (ADS)
Sanna, N.; Morelli, G.
2004-09-01
In this paper we present the new version of the SCELib program (CPC Catalogue identifier ADMG) a full numerical implementation of the Single Center Expansion (SCE) method. The physics involved is that of producing the SCE description of molecular electronic densities, of molecular electrostatic potentials and of molecular perturbed potentials due to a point negative or positive charge. This new revision of the program has been optimized to run in serial as well as in parallel execution mode, to support a larger set of molecular symmetries and to permit the restart of long-lasting calculations. To measure the performance of this new release, a comparative study has been carried out on the most powerful computing architectures in serial and parallel runs. The results of the calculations reported in this paper refer to real cases medium to large molecular systems and they are reported in full details to benchmark at best the parallel architectures the new SCELib code will run on. Program summaryTitle of program: SCELib2 Catalogue identifier: ADGU Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADGU Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Reference to previous versions: Comput. Phys. Commun. 128 (2) (2000) 139 (CPC catalogue identifier: ADMG) Does the new version supersede the original program?: Yes Computer for which the program is designed and others on which it has been tested: HP ES45 and rx2600, SUN ES4500, IBM SP and any single CPU workstation based on Alpha, SPARC, POWER, Itanium2 and X86 processors Installations: CASPUR, local Operating systems under which the program has been tested: HP Tru64 V5.X, SUNOS V5.8, IBM AIX V5.X, Linux RedHat V8.0 Programming language used: C Memory required to execute with typical data: 10 Mwords. Up to 2000 Mwords depending on the molecular system and runtime parameters No. of bits in a word: 64 No. of processors used: 1 to 32 Has the code been vectorized or parallelized?: Yes No. of bytes in distributed program, including test data, etc.: 3 798 507 No. of lines in distributed program, including test data, etc.: 187 226 Distribution format: tar.gz Nature of physical problem: In this set of codes an efficient procedure is implemented to describe the wavefunction and related molecular properties of a polyatomic molecular system within the Single Center of Expansion (SCE) approximation. The resulting SCE wavefunction, electron density, electrostatic and exchange/correlation potentials can then be used via a proper Application Programming Interface (API) to describe the target molecular system which can be employed in electron-molecule scattering calculations. The molecular properties expanded over a single center turn out to also be of more general application and some possible uses in quantum chemistry, biomodelling and drug design are also outlined. Method of solution: The polycentre Hartee-Fock solution for a molecule of arbitrary geometry, based on linear combination of Gaussian-Type Orbital (GTO), is expanded over a single center, typically the Center Of Mass (C.O.M.), by means of a Gauss-Legendre/Chebyschev quadrature over the θ, φ angular coordinates. The resulting SCE numerical wavefunction is then used to calculate the one-particle electron density, the electrostatic potential and two different models for the correlation/polarization potentials induced by the impinging electron, which have the correct asymptotic behaviour for the leading dipole molecular polarizabilities. Restrictions on the complexity of the problem: Depending on the molecular system under study and on the operating conditions the program may or may not fit into available RAM memory. In this case a feature of the program is to memory map a disk file in order to efficiently access the memory data through a disk device. Typical running time: The execution time strongly depends on the molecular target description and on the hardware/OS chosen, it is directly proportional to the ( r, θ, φ) grid size and to the number of angular basis functions used. Thus, from the program printout of the main arrays memory occupancy, the user can approximately derive the expected computer time needed for a given calculation executed in serial mode. For parallel executions the overall efficiency must be further taken into account, and this depends on the no. of processors used as well as on the parallel architecture chosen, so a simple general law is at present not determinable. Unusual features of the program: The code has been engineered to use dynamical, runtime determined, global parameters with the aim to have all the data fitted in the RAM memory. Some unusual circumstances, e.g., when using large values of those parameters, may cause the program to run with unexpected performance reductions due to runtime bottlenecks like those caused by memory swap operations which strongly depend on the hardware used. In such cases, a parallel execution of the code is generally sufficient to fix the problem since the data size is partitioned over the available processors. When a suitable parallel system is not available for execution, a mechanism of memory mapped file can be used; with this option on, all the available memory will be used as a buffer for a disk file which contains the whole data set, thus having a better throughput with respect to the traditional swapping/paging of the Unix OS.
Jordan, K M; Inskeep, E K; Knights, M
2009-12-01
Three experiments were conducted on anestrous ewes of Suffolk, Dorset, and Katahdin breeding to examine the potential value of GnRH to improve ovulation and pregnancy in response to introduction of rams. In Experiment 1, treatment with GnRH 2d after treatment with progesterone (P(4); 25mg i.m.) at introduction of rams was compared to treatment with P(4) alone at the time of introduction of rams. Treatment with GnRH did not increase percentages of ewes with a corpus luteum (CL) 14d after introduction of rams, pregnant 32d after treatment with PGF(2)alpha 14d after introduction of rams, or percent of treated ewes lambing to all services. In Experiment 2, treatments with GnRH on day 2, 7, or both after introduction of rams were compared. Treatments did not differ in mean estrous response, percentages of ewes with a detectable CL or number of CL present on day 11, or mean pregnancy and lambing rates. Therefore, neither one nor two injections of GnRH at these times appeared to be effective to induce anestrous ewes to breed. In Experiment 3, treatments compared included GnRH 4d before introduction of rams, GnRH 4d before and 1d after introduction of rams, ram introduction alone, and treatment with P(4) (25mg i.m.) at the time of introduction of rams. Percentages of ewes with concentrations of P(4) greater than 1ng/mL (indicating formation of CL had occurred) 7d after ram introduction tended to be greater (P<0.07) in ewes treated with GnRH or P(4) than in control ewes treated with ram introduction alone. However, there was no difference in P(4) concentrations between groups by day 11 or 12 after introduction of rams. Estrous response rates and percentages of ewes pregnant 95d after PGF(2)alpha was administered (on day 12 after introduction of rams) tended to be greater (P=0.08 and 0.06, respectively) in ewes treated with GnRH or P(4) than in ewes exposed to rams only. There was no difference in response variables between ewes treated with GnRH 4d before introduction of rams and ewes treated with GnRH 4d before and 1d after introduction of rams. In conclusion, treatment with GnRH 4d before ram introduction showed promise as an alternative to treatment with P(4) to improve the ovulatory response and reproductive performance of ewes introduced to rams during seasonal anestrus.
CHASM and SNVBox: toolkit for detecting biologically important single nucleotide mutations in cancer
Carter, Hannah; Diekhans, Mark; Ryan, Michael C.; Karchin, Rachel
2011-01-01
Summary: Thousands of cancer exomes are currently being sequenced, yielding millions of non-synonymous single nucleotide variants (SNVs) of possible relevance to disease etiology. Here, we provide a software toolkit to prioritize SNVs based on their predicted contribution to tumorigenesis. It includes a database of precomputed, predictive features covering all positions in the annotated human exome and can be used either stand-alone or as part of a larger variant discovery pipeline. Availability and Implementation: MySQL database, source code and binaries freely available for academic/government use at http://wiki.chasmsoftware.org, Source in Python and C++. Requires 32 or 64-bit Linux system (tested on Fedora Core 8,10,11 and Ubuntu 10), 2.5*≤ Python <3.0*, MySQL server >5.0, 60 GB available hard disk space (50 MB for software and data files, 40 GB for MySQL database dump when uncompressed), 2 GB of RAM. Contact: karchin@jhu.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21685053
Faster Bit-Parallel Algorithms for Unordered Pseudo-tree Matching and Tree Homeomorphism
NASA Astrophysics Data System (ADS)
Kaneta, Yusaku; Arimura, Hiroki
In this paper, we consider the unordered pseudo-tree matching problem, which is a problem of, given two unordered labeled trees P and T, finding all occurrences of P in T via such many-one embeddings that preserve node labels and parent-child relationship. This problem is closely related to tree pattern matching problem for XPath queries with child axis only. If m > w , we present an efficient algorithm that solves the problem in O(nm log(w)/w) time using O(hm/w + mlog(w)/w) space and O(m log(w)) preprocessing on a unit-cost arithmetic RAM model with addition, where m is the number of nodes in P, n is the number of nodes in T, h is the height of T, and w is the word length. We also discuss a modification of our algorithm for the unordered tree homeomorphism problem, which corresponds to a tree pattern matching problem for XPath queries with descendant axis only.
Development of battering ram vibrator system
NASA Astrophysics Data System (ADS)
Sun, F.; Chen, Z.; Lin, J.; Tong, X.
2012-12-01
This paper researched the battering ram vibrator system, by electric machinery we can control oil system of battering ram, we realized exact control of battering ram, after analyzed pseudorandom coding, code "0" and "1" correspond to rest and shake of battering ram, then we can get pseudorandom coding which is the same with battering ram vibrator. After testing , by the reference trace and single shot record, when we using pseudorandom coding mode, the ratio of seismic wavelet to correlation interfere is about 68 dB, while the general mode , the ratio of seismic wavelet to correlation interfere only is 27.9dB, by battering ram vibrator system, we can debase the correlation interfere which come from the single shaking frequency of battering ram, this system advanced the signal-to-noise ratio of seismic data, which can give direction of the application of battering ram vibrator in metal mine exploration and high resolving seismic exploration.
2014-01-01
Background The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n3) and of O(n5) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. Conclusions In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one. PMID:25077818
D'Angelo, Gianni; Rampone, Salvatore
2014-01-01
The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of the communications between different memories (RAM, Cache, Mass, Virtual) and to achieve efficient I/O performance, we design a mass storage structure able to access its data with a high degree of temporal and spatial locality. Then we develop a parallel implementation of the algorithm. We model it as a SPMD system together to a Message-Passing Programming Paradigm. Here, we adopt the high-level message-passing systems MPI (Message Passing Interface) in the version for the Java programming language, MPJ. The parallel processing is organized into four stages: partitioning, communication, agglomeration and mapping. The decomposition of the U-BRAIN algorithm determines the necessity of a communication protocol design among the processors involved. Efficient synchronization design is also discussed. In the context of a collaboration between public and private institutions, the parallel model of U-BRAIN has been implemented and tested on the INTEL XEON E7xxx and E5xxx family of the CRESCO structure of Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA), developed in the framework of the European Grid Infrastructure (EGI), a series of efforts to provide access to high-throughput computing resources across Europe using grid computing techniques. The implementation is able to minimize both the memory space and the execution time. The test data used in this study are IPDATA (Irvine Primate splice- junction DATA set), a subset of HS3D (Homo Sapiens Splice Sites Dataset) and a subset of COSMIC (the Catalogue of Somatic Mutations in Cancer). The execution time and the speed-up on IPDATA reach the best values within about 90 processors. Then the parallelization advantage is balanced by the greater cost of non-local communications between the processors. A similar behaviour is evident on HS3D, but at a greater number of processors, so evidencing the direct relationship between data size and parallelization gain. This behaviour is confirmed on COSMIC. Overall, the results obtained show that the parallel version is up to 30 times faster than the serial one.
Low-cost real-time infrared scene generation for image projection and signal injection
NASA Astrophysics Data System (ADS)
Buford, James A., Jr.; King, David E.; Bowden, Mark H.
1998-07-01
As cost becomes an increasingly important factor in the development and testing of Infrared sensors and flight computer/processors, the need for accurate hardware-in-the- loop (HWIL) simulations is critical. In the past, expensive and complex dedicated scene generation hardware was needed to attain the fidelity necessary for accurate testing. Recent technological advances and innovative applications of established technologies are beginning to allow development of cost-effective replacements for dedicated scene generators. These new scene generators are mainly constructed from commercial-off-the-shelf (COTS) hardware and software components. At the U.S. Army Aviation and Missile Command (AMCOM) Missile Research, Development, and Engineering Center (MRDEC), researchers have developed such a dynamic IR scene generator (IRSG) built around COTS hardware and software. The IRSG is used to provide dynamic inputs to an IR scene projector for in-band seeker testing and for direct signal injection into the seeker or processor electronics. AMCOM MRDEC has developed a second generation IRSG, namely IRSG2, using the latest Silicon Graphics Incorporated (SGI) Onyx2 with Infinite Reality graphics. As reported in previous papers, the SGI Onyx Reality Engine 2 is the platform of the original IRSG that is now referred to as IRSG1. IRSG1 has been in operation and used daily for the past three years on several IR projection and signal injection HWIL programs. Using this second generation IRSG, frame rates have increased from 120 Hz to 400 Hz and intensity resolution from 12 bits to 16 bits. The key features of the IRSGs are real time missile frame rates and frame sizes, dynamic missile-to-target(s) viewpoint updated each frame in real-time by a six-degree-of- freedom (6DOF) system under test (SUT) simulation, multiple dynamic objects (e.g. targets, terrain/background, countermeasures, and atmospheric effects), latency compensation, point-to-extended source anti-aliased targets, and sensor modeling effects. This paper provides a comparison between the IRSG1 and IRSG2 systems and focuses on the IRSG software, real time features, and database development tools.
Dickinson, Paul A; Kesisoglou, Filippos; Flanagan, Talia; Martinez, Marilyn N; Mistry, Hitesh B; Crison, John R; Polli, James E; Cruañes, Maria T; Serajuddin, Abu T M; Müllertz, Anette; Cook, Jack A; Selen, Arzu
2016-11-01
The aim of Biopharmaceutics Risk Assessment Roadmap (BioRAM) and the BioRAM Scoring Grid is to facilitate optimization of clinical performance of drug products. BioRAM strategy relies on therapy-driven drug delivery and follows an integrated systems approach for formulating and addressing critical questions and decision-making (J Pharm Sci. 2014,103(11): 3777-97). In BioRAM, risk is defined as not achieving the intended in vivo drug product performance, and success is assessed by time to decision-making and action. Emphasis on time to decision-making and time to action highlights the value of well-formulated critical questions and well-designed and conducted integrated studies. This commentary describes and illustrates application of the BioRAM Scoring Grid, a companion to the BioRAM strategy, which guides implementation of such an integrated strategy encompassing 12 critical areas and 6 assessment stages. Application of the BioRAM Scoring Grid is illustrated using published literature. Organizational considerations for implementing BioRAM strategy, including the interactions, function, and skillsets of the BioRAM group members, are also reviewed. As a creative and innovative systems approach, we believe that BioRAM is going to have a broad-reaching impact, influencing drug development and leading to unique collaborations influencing how we learn, and leverage and share knowledge. Published by Elsevier Inc.
Effect of breed and age on sexual behaviour of rams.
Simitzis, Panagiotis E; Deligeorgis, Stelios G; Bizelis, Joseph A
2006-05-01
The objective of this study was to highlight the problems that arise during the reproduction between thin-tailed rams and fat-tailed ewes. At the same time, particular emphasis laid on the influence of sheep breed, sheep age, time after ram introduction and day of the ewe estrus cycle on ram and ewe sexual behaviour. Rams were subjected to sexual performance tests by being individually exposed to 12 ewes for 3 h daily, 19 consecutive days. The 16 rams of the experiment were separated according to their age (9 and 21 months old) and breed (Chios and Karagouniki), and the 96 ewes of Chios fat-tailed breed, were divided by age (9 and 21 months old). The main characteristics of courtship behaviour, like sniffing, nudging, flehmen response and following were recorded and studied in detail. Mature Chios rams, which were the only one with previous experience of Chios ewes, exhibited higher rates of sexual interest per ewe than the other rams (P < 0.05). On the other hand, rams sniffed and nudged more young than mature ewes (P < 0.05), probably due to the fact that young ewes did not express intense symptoms of estrus. Young rams exhibited substandard sexual interest towards mature ewes, when they first came in contact with them (P < 0.05). In general, Karagouniki thin-tailed rams exhibited reduced rates of mating behaviour when they courted with Chios fat-tailed ewes in comparison with Chios rams (P < 0.05). Moreover, as the time after ram introduction passed, the frequency and duration of sexual behaviour components decreased (P < 0.001). Finally, the effect of the day of the experiment was only significant in the case of sniffing, which increased during the first 2 days and then declined and stabilized (P < 0.01). As it was demonstrated, ram age and ram breed played a fundamental role in the exhibition of sexual interest elements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
C. Cuevas, B. Raydo, H. Dong, A. Gupta, F.J. Barbosa, J. Wilson, W.M. Taylor, E. Jastrzembski, D. Abbott
We will demonstrate a hardware and firmware solution for a complete fully pipelined multi-crate trigger system that takes advantage of the elegant high speed VXS serial extensions for VME. This trigger system includes three sections starting with the front end crate trigger processor (CTP), a global Sub-System Processor (SSP) and a Trigger Supervisor that manages the timing, synchronization and front end event readout. Within a front end crate, trigger information is gathered from each 16 Channel, 12 bit Flash ADC module at 4 nS intervals via the VXS backplane, to a Crate Trigger Processor (CTP). Each Crate Trigger Processor receivesmore » these 500 MB/S VXS links from the 16 FADC-250 modules, aligns skewed data inherent of Aurora protocol, and performs real time crate level trigger algorithms. The algorithm results are encoded using a Reed-Solomon technique and transmission of this Level 1 trigger data is sent to the SSP using a multi-fiber link. The multi-fiber link achieves an aggregate trigger data transfer rate to the global trigger at 8 Gb/s. The SSP receives and decodes Reed-Solomon error correcting transmission from each crate, aligns the data, and performs the global level trigger algorithms. The entire trigger system is synchronous and operates at 250 MHz with the Trigger Supervisor managing not only the front end event readout, but also the distribution of the critical timing clocks, synchronization signals, and the global trigger signals to each front end readout crate. These signals are distributed to the front end crates on a separate fiber link and each crate is synchronized using a unique encoding scheme to guarantee that each front end crate is synchronous with a fixed latency, independent of the distance between each crate. The overall trigger signal latency is <3 uS, and the proposed 12GeV experiments at Jefferson Lab require up to 200KHz Level 1 trigger rate.« less
Hawken, P A R; Evans, A C O; Beard, A P
2008-07-01
The ram effect is widely used in Mediterranean breeds of sheep but its use in temperate genotypes is restricted by breed seasonality. However, ewes from these highly seasonal genotypes are sensitive to stimulation by rams close to the onset of the natural breeding season. In this study we developed a pre-mating protocol of repeated, short-term exposure to rams (fence-line contact or vasectomised rams) beginning during late anoestrus and continuing into the breeding season. We hypothesised that this pre-mating protocol would synchronise the distribution of mating of North of England Mule ewes during the breeding season above that observed in ewes isolated from rams prior to mating. Ram-exposed ewes were given contact with rams (Experiment 1: fence-line; FR, n=94 and Experiment 2: vasectomised rams; VR; n=103) for 24h on Days 0 (10 September), 17 and 34 of the experiment. Control ewes (Experiment 1; FC, n=98 and Experiment 2; VC; n=106) remained isolated from rams prior to mating. In Experiment 2, a subset of VR (n=35) and VC ewes (n=35) were blood sampled twice weekly to monitor their pre-mating progesterone profiles. At mating, harnessed entire rams were introduced, 17 or 16 days after the last ram exposure (Experiments 1 and 2) and raddle marks were recorded daily. The median time from ram introduction to mating was reduced in ewes given both fence-line and vasectomised ram contact (P<0.001), leading to a more compact distribution of mating and lambing (At least P<0.01). In the blood sampled VR ewes, there was a progressive decline in the number of days from ram exposure to the onset of dioestrus (at least P<0.05). This observation indicates that the cycles in VR ewes became increasingly synchronised over the pre-mating period, a pattern not evident in VC ewes. In conclusion, repeated, short-term exposure of ewes to rams during the transition into the breeding season is an effective method of synchronising the distribution of mating during the breeding season.
Selective visual region of interest to enhance medical video conferencing
NASA Astrophysics Data System (ADS)
Bonneau, Walt, Jr.; Read, Christopher J.; Shirali, Girish
1998-06-01
The continued economic pressure that is being placed upon the healthcare industry creates both challenge and opportunity to develop cost effective healthcare tools. Tools that provide improvements in the quality of medical care at the same time improve the distribution of efficient care will create product demand. Video Conferencing systems are one of the latest product technologies that are evolving their way into healthcare applications. The systems that provide quality Bi- directional video and imaging at the lowest system and communication cost are creating many possible options for the healthcare industry. A method to use only 128k bits/sec. of ISDN bandwidth while providing quality video images in selected regions will be applied to echocardiograms using a low cost video conferencing system operating within a basic rate ISDN line bandwidth. Within a given display area (frame) it has been observed that only selected informational areas of the frame of are of value when viewing for detail and precision within an image. Much in the same manner that a photograph is cropped. If a method to accomplish Region Of Interest (ROI) was applied to video conferencing using H.320 with H.263 (compression) and H.281 (camera control) international standards, medical image quality could be achieved in a cost-effective manner. For example, the cardiologist could be provided with a selectable three to eight end-point viewable ROI polygon that defines the ROI in the image. This is achieved by the video system calculating the selected regional end-points and creating an alpha mask to signify the importance of the ROI to the compression processor. This region is then applied to the compression algorithm in a manner that the majority of the video conferencing processor cycles are focused on the ROI of the image. An occasional update of the non-ROI area is processed to maintain total image coherence. The user could control the non-ROI area updates. Providing encoder side ROI specification is of value. However, the power of this capability is improved if remote access and selection of the ROI is also provided. Using the H.281 camera standard and proposing an additional option to the standard to allow for remote ROI selection would make this possible. When ROI is applied the ability to reach the equivalent of 384K bits/sec ISDN rates may be achieved or exceeded depending upon the size of the selected ROI using 128K bits/sec. This opens additional opportunity to establish international calling and reduced call rates by up to sixty- six percent making reoccurring communication costs attractive. Rates of twenty to thirty quality ROI updates could be achieved. It is however important to understand that this technique is still under development.
Pi-Sat: A Low Cost Small Satellite and Distributed Spacecraft Mission System Test Platform
NASA Technical Reports Server (NTRS)
Cudmore, Alan
2015-01-01
Current technology and budget trends indicate a shift in satellite architectures from large, expensive single satellite missions, to small, low cost distributed spacecraft missions. At the center of this shift is the SmallSatCubesat architecture. The primary goal of the Pi-Sat project is to create a low cost, and easy to use Distributed Spacecraft Mission (DSM) test bed to facilitate the research and development of next-generation DSM technologies and concepts. This test bed also serves as a realistic software development platform for Small Satellite and Cubesat architectures. The Pi-Sat is based on the popular $35 Raspberry Pi single board computer featuring a 700Mhz ARM processor, 512MB of RAM, a flash memory card, and a wealth of IO options. The Raspberry Pi runs the Linux operating system and can easily run Code 582s Core Flight System flight software architecture. The low cost and high availability of the Raspberry Pi make it an ideal platform for a Distributed Spacecraft Mission and Cubesat software development. The Pi-Sat models currently include a Pi-Sat 1U Cube, a Pi-Sat Wireless Node, and a Pi-Sat Cubesat processor card.The Pi-Sat project takes advantage of many popular trends in the Maker community including low cost electronics, 3d printing, and rapid prototyping in order to provide a realistic platform for flight software testing, training, and technology development. The Pi-Sat has also provided fantastic hands on training opportunities for NASA summer interns and Pathways students.
Motion and Emotional Behavior Design for Pet Robot Dog
NASA Astrophysics Data System (ADS)
Cheng, Chi-Tai; Yang, Yu-Ting; Miao, Shih-Heng; Wong, Ching-Chang
A pet robot dog with two ears, one mouth, one facial expression plane, and one vision system is designed and implemented so that it can do some emotional behaviors. Three processors (Inter® Pentium® M 1.0 GHz, an 8-bit processer 8051, and embedded soft-core processer NIOS) are used to control the robot. One camera, one power detector, four touch sensors, and one temperature detector are used to obtain the information of the environment. The designed robot with 20 DOF (degrees of freedom) is able to accomplish the walking motion. A behavior system is built on the implemented pet robot so that it is able to choose a suitable behavior for different environmental situation. From the practical test, we can see that the implemented pet robot dog can do some emotional interaction with the human.
Infrared readout electronics; Proceedings of the Meeting, Orlando, FL, Apr. 21, 22, 1992
NASA Technical Reports Server (NTRS)
Fossum, Eric R. (Editor)
1992-01-01
The present volume on IR readout electronics discusses cryogenic readout using silicon devices, cryogenic readout using III-V and LTS devices, multiplexers for higher temperatures, and focal-plane signal processing electronics. Attention is given to the optimization of cryogenic CMOS processes for sub-10-K applications, cryogenic measurements of aerojet GaAs n-JFETs, inP-based heterostructure device technology for ultracold readout applications, and a three-terminal semiconductor-superconductor transimpedance amplifier. Topics addressed include unfulfilled needs in IR astronomy focal-plane readout electronics, IR readout integrated circuit technology for tactical missile systems, and radiation-hardened 10-bit A/D for FPA signal processing. Also discussed are the implementation of a noise reduction circuit for spaceflight IR spectrometers, a real-time processor for staring receivers, and a fiber-optic link design for INMOS transputers.
A system-level view of optimizing high-channel-count wireless biosignal telemetry.
Chandler, Rodney J; Gibson, Sarah; Karkare, Vaibhav; Farshchi, Shahin; Marković, Dejan; Judy, Jack W
2009-01-01
In this paper we perform a system-level analysis of a wireless biosignal telemetry system. We perform an analysis of each major system component (e.g., analog front end, analog-to-digital converter, digital signal processor, and wireless link), in which we consider physical, algorithmic, and design limitations. Since there are a wide range applications for wireless biosignal telemetry systems, each with their own unique set of requirements for key parameters (e.g., channel count, power dissipation, noise level, number of bits, etc.), our analysis is equally broad. The net result is a set of plots, in which the power dissipation for each component and as the system as a whole, are plotted as a function of the number of channels for different architectural strategies. These results are also compared to existing implementations of complete wireless biosignal telemetry systems.
SAMPA Chip: the New 32 Channels ASIC for the ALICE TPC and MCH Upgrades
NASA Astrophysics Data System (ADS)
Adolfsson, J.; Ayala Pabon, A.; Bregant, M.; Britton, C.; Brulin, G.; Carvalho, D.; Chambert, V.; Chinellato, D.; Espagnon, B.; Hernandez Herrera, H. D.; Ljubicic, T.; Mahmood, S. M.; Mjörnmark, U.; Moraes, D.; Munhoz, M. G.; Noël, G.; Oskarsson, A.; Osterman, L.; Pilyar, A.; Read, K.; Ruette, A.; Russo, P.; Sanches, B. C. S.; Severo, L.; Silvermyr, D.; Suire, C.; Tambave, G. J.; Tun-Lanoë, K. M. M.; van Noije, W.; Velure, A.; Vereschagin, S.; Wanlin, E.; Weber, T. O.; Zaporozhets, S.
2017-04-01
This paper presents the test results of the second prototype of SAMPA, the ASIC designed for the upgrade of read-out front end electronics of the ALICE Time Projection Chamber (TPC) and Muon Chamber (MCH). SAMPA is made in a 130 nm CMOS technology with 1.25 V nominal voltage supply and provides 32 channels, with selectable input polarity, and three possible combinations of shaping time and sensitivity. Each channel consists of a Charge Sensitive Amplifier, a semi-Gaussian shaper and a 10-bit ADC; a Digital Signal Processor provides digital filtering and compression capability. In the second prototype run both full chip and single test blocks were fabricated, allowing block characterization and full system behaviour studies. Experimental results are here presented showing agreement with requirements for both the blocks and the full chip.
The interbreeding of sheep and goats.
Kelk, D A; Gartley, C J; Buckrell, B C; King, W A
1997-01-01
To determine the outcome of interbreeding sheep and goats, ewes and does were bred to rams and bucks, and their embryos recovered. Pregnancy was monitored in 2 does bred to a ram. Fertilization rates in ram X does, buck X does, ram X ewes, and buck X ewes were 72%, 96%, 90%, and 0%, respectively. Ram X doe fetuses died at 5 to 10 weeks. PMID:9105723
Epididymitis Caused by Brucella ovis in a Southern Ontario Sheep Flock
Buckrell, Brian C.; McEwen, Scott A.; Johnson, Walter H.; Savage, Neale C.
1985-01-01
Epididymitis was diagnosed in three rams in a commercial sheep flock in southern Ontario. The affected rams had palpably enlarged epididymides and two rams had semen which contained inflammatory cells and was of poor quality. Serum compliment fixation titers for Brucella ovis were 1:20, 1:80 and 1:90. Five other rams in the flock were clinically normal and without titers. Two of the affected rams had lesions similar to those produced by experimental infection with B. ovis. The infection in the rams had no apparent affect on ewe performance. The source of the infection remains unknown, but the rams were purchased from a flock which had imported ewes from the western U.S.A. ImagesFigure 1.Figure 2. PMID:17422577
Sexual behaviour of rams: male orientation and its endocrine correlates.
Resko, J A; Perkins, A; Roselli, C E; Stellflug, J N; Stormshak, F K
1999-01-01
The components of heterosexual behaviour in rams are reviewed as a basis for understanding partner preference behaviour. A small percentage of rams will not mate with oestrous females and if given a choice will display courtship behaviour towards another ram in preference to a female. Some of the endocrine profiles of these male-oriented rams differ from those of heterosexual controls. These differences include reduced serum concentrations of testosterone, oestradiol and oestrone, reduced capacity to produce testosterone in vitro, and reduced capacity to aromatize androgens in the preoptic-anterior hypothalamus of the brain. Our observation that aromatase activity is significantly lower in the preoptic-anterior hypothalamic area of male-oriented rams than in female-oriented rams may indicate an important neurochemical link to sexual behaviour that should be investigated. The defect in steroid hormone production by the adult testes of the male-oriented ram may represent a defect that can be traced to the fetal testes. If this contention is correct, partner preference behaviour of rams may also be traceable to fetal development and represent a phenomenon of sexual differentiation.
Frame Decoder for Consultative Committee for Space Data Systems (CCSDS)
NASA Technical Reports Server (NTRS)
Reyes, Miguel A. De Jesus
2014-01-01
GNU Radio is a free and open source development toolkit that provides signal processing to implement software radios. It can be used with low-cost external RF hardware to create software defined radios, or without hardware in a simulation-like environment. GNU Radio applications are primarily written in Python and C++. The Universal Software Radio Peripheral (USRP) is a computer-hosted software radio designed by Ettus Research. The USRP connects to a host computer via high-speed Gigabit Ethernet. Using the open source Universal Hardware Driver (UHD), we can run GNU Radio applications using the USRP. An SDR is a "radio in which some or all physical layer functions are software defined"(IEEE Definition). A radio is any kind of device that wirelessly transmits or receives radio frequency (RF) signals in the radio frequency. An SDR is a radio communication system where components that have been typically implemented in hardware are implemented in software. GNU Radio has a generic packet decoder block that is not optimized for CCSDS frames. Using this generic packet decoder will add bytes to the CCSDS frames and will not permit for bit error correction using Reed-Solomon. The CCSDS frames consist of 256 bytes, including a 32-bit sync marker (0x1ACFFC1D). This frames are generated by the Space Data Processor and GNU Radio will perform the modulation and framing operations, including frame synchronization.
Bayer image parallel decoding based on GPU
NASA Astrophysics Data System (ADS)
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Mongoose: Creation of a Rad-Hard MIPS R3000
NASA Technical Reports Server (NTRS)
Lincoln, Dan; Smith, Brian
1993-01-01
This paper describes the development of a 32 Bit, full MIPS R3000 code-compatible Rad-Hard CPU, code named Mongoose. Mongoose progressed from contract award, through the design cycle, to operational silicon in 12 months to meet a space mission for NASA. The goal was the creation of a fully static device capable of operation to the maximum Mil-883 derated speed, worst-case post-rad exposure with full operational integrity. This included consideration of features for functional enhancements relating to mission compatibility and removal of commercial practices not supported by Rad-Hard technology. 'Mongoose' developed from an evolution of LSI Logic's MIPS-I embedded processor, LR33000, code named Cobra, to its Rad-Hard 'equivalent', Mongoose. The term 'equivalent' is used to infer that the core of the processor is functionally identical, allowing the same use and optimizations of the MIPS-I Instruction Set software tool suite for compilation, software program trace, etc. This activity was started in September of 1991 under a contract from NASA-Goddard Space Flight Center (GSFC)-Flight Data Systems. The approach affected a teaming of NASA-GSFC for program development, LSI Logic for system and ASIC design coupled with the Rad-Hard process technology, and Harris (GASD) for Rad-Hard microprocessor design expertise. The program culminated with the generation of Rad-Hard Mongoose prototypes one year later.
Mobile high-performance computing (HPC) for synthetic aperture radar signal processing
NASA Astrophysics Data System (ADS)
Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen
2018-04-01
The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groth, Katrina M.; Zumwalt, Hannah Ruth; Clark, Andrew Jordan
2016-03-01
Hydrogen Risk Assessment Models (HyRAM) is a prototype software toolkit that integrates data and methods relevant to assessing the safety of hydrogen fueling and storage infrastructure. The HyRAM toolkit integrates deterministic and probabilistic models for quantifying accident scenarios, predicting physical effects, and characterizing the impact of hydrogen hazards, including thermal effects from jet fires and thermal pressure effects from deflagration. HyRAM version 1.0 incorporates generic probabilities for equipment failures for nine types of components, and probabilistic models for the impact of heat flux on humans and structures, with computationally and experimentally validated models of various aspects of gaseous hydrogen releasemore » and flame physics. This document provides an example of how to use HyRAM to conduct analysis of a fueling facility. This document will guide users through the software and how to enter and edit certain inputs that are specific to the user-defined facility. Description of the methodology and models contained in HyRAM is provided in [1]. This User’s Guide is intended to capture the main features of HyRAM version 1.0 (any HyRAM version numbered as 1.0.X.XXX). This user guide was created with HyRAM 1.0.1.798. Due to ongoing software development activities, newer versions of HyRAM may have differences from this guide.« less
FPS-RAM: Fast Prefix Search RAM-Based Hardware for Forwarding Engine
NASA Astrophysics Data System (ADS)
Zaitsu, Kazuya; Yamamoto, Koji; Kuroda, Yasuto; Inoue, Kazunari; Ata, Shingo; Oka, Ikuo
Ternary content addressable memory (TCAM) is becoming very popular for designing high-throughput forwarding engines on routers. However, TCAM has potential problems in terms of hardware and power costs, which limits its ability to deploy large amounts of capacity in IP routers. In this paper, we propose new hardware architecture for fast forwarding engines, called fast prefix search RAM-based hardware (FPS-RAM). We designed FPS-RAM hardware with the intent of maintaining the same search performance and physical user interface as TCAM because our objective is to replace the TCAM in the market. Our RAM-based hardware architecture is completely different from that of TCAM and has dramatically reduced the costs and power consumption to 62% and 52%, respectively. We implemented FPS-RAM on an FPGA to examine its lookup operation.
Khnissi, S; Lassoued, N; Rekik, M; Ben Salem, H
2016-02-01
This study aimed to investigate the effect of water deprivation (WD) on reproductive traits of rams. Ten mature rams were used and allocated to two groups balanced for body weight. Control (C) rams had free access to drinking water, while water-restricted rams (WD) were deprived from water for 3 consecutive days and early on the morning of day 4, they had ad libitum access to water for 24 h, similar to C animals. The experiment lasted 32 days, that is eight 4-day cycles of water deprivation and subsequent watering. Feed and water intake were significantly affected by water deprivation; in comparison with C rams, WD rams reduced their feed intake by 18%. During the watering day of the deprivation cycle, WD rams consumed more water than C rams on the same day (11.8 (SD = 3.37) and 8.4 (SD = 1.92) l respectively; p < 0.05). Glucose, total protein and creatinine were increased as a result of water deprivation. However, testosterone levels were lowered as a result of water deprivation and average values were 10.9 and 6.2 (SEM 1.23) ng/ml for C and WD rams respectively (p < 0.05). Semen traits were less affected by treatment; WD rams consistently had superior sperm concentrations than C animals; and statistical significances were reached in cycles 5 and 8 of water deprivation. Several mating behaviour traits were modified as a result of water deprivation. When compared to controls, WD rams had a more prolonged time to first mount attempt (p < 0.001), their frequency of mount attempts decreased [6.8 vs. 5.2 (SEM 0.1); p < 0.001] and their flehmen reaction intensity was negatively affected (p < 0.05). Water deprivation may have practical implications reducing the libido and therefore the serving capacity of rams under field conditions. Journal of Animal Physiology and Animal Nutrition © 2015 Blackwell Verlag GmbH.
Abou-Shanab, Reda A I; Eraky, Mohamed; Haddad, Ahmed M; Abdel-Gaffar, Abdel-Rahman B; Salem, Ahmed M
2016-11-01
A total of twenty bacterial cultures were isolated from hydrocarbon contaminated soil. Of the 20 isolates, RAM03, RAM06, RAM13, and RAM17 were specifically chosen based on their relatively higher growth on salt medium amended with 4 % crude oil, emulsion index, surface tension, and degradation percentage. These bacterial cultures had 16S rRNA gene sequences that were most similar to Ochrobactrum cytisi (RAM03), Ochrobactrum anthropi (RAM06 and RAM17), and Sinorhizobium meliloti (RAM13) with 96 %, 100 % and 99 %, and 99 % similarity. The tested strains revealed a promising potential for bioremediation of petroleum oil contamination as they could degrade >93 % and 54 % of total petroleum hydrocarbons (TPHs) in a liquid medium and soil amended with 4 % crude oil, respectively, after 30 day incubation. These bacteria could effectively remove both aliphatic and aromatic petroleum hydrocarbons. In conclusion, these strains could be considered as good prospects for their application in bioremediation of hydrocarbon contaminated environment.
Buntinas, L; Gunter, K K; Sparagna, G C; Gunter, T E
2001-04-02
A mechanism of Ca(2+) uptake, capable of sequestering significant amounts of Ca(2+) from cytosolic Ca(2+) pulses, has previously been identified in liver mitochondria. This mechanism, the Rapid Mode of Ca(2+) uptake (RaM), was shown to sequester Ca(2+) very rapidly at the beginning of each pulse in a sequence [Sparagna et al. (1995) J. Biol. Chem. 270, 27510-27515]. The existence and properties of RaM in heart mitochondria, however, are unknown and are the basis for this study. We show that RaM functions in heart mitochondria with some of the characteristics of RaM in liver, but its activation and inhibition are quite different. It is feasible that these differences represent different physiological adaptations in these two tissues. In both tissues, RaM is highly conductive at the beginning of a Ca(2+) pulse, but is inhibited by the rising [Ca(2+)] of the pulse itself. In heart mitochondria, the time required at low [Ca(2+)] to reestablish high Ca(2+) conductivity via RaM i.e. the 'resetting time' of RaM is much longer than in liver. RaM in liver mitochondria is strongly activated by spermine, activated by ATP or GTP and unaffected by ADP and AMP. In heart, RaM is activated much less strongly by spermine and unaffected by ATP or GTP. RaM in heart is strongly inhibited by AMP and has a biphasic response to ADP; it is activated at low concentrations and inhibited at high concentrations. Finally, an hypothesis consistent with the data and characteristics of liver and heart is presented to explain how RaM may function to control the rate of oxidative phosphorylation in each tissue. Under this hypothesis, RaM functions to create a brief, high free Ca(2+) concentration inside mitochondria which may activate intramitochondrial metabolic reactions with relatively small amounts of Ca(2+) uptake. This hypothesis is consistent with the view that intramitochondrial [Ca(2+)] may be used to control the rate of ADP phosphorylation in such a way as to minimize the probability of activating the Ca(2+)-induced mitochondrial membrane permeability transition (MPT).
mRNA Cap Methyltransferase, RNMT-RAM, Promotes RNA Pol II-Dependent Transcription.
Varshney, Dhaval; Lombardi, Olivia; Schweikert, Gabriele; Dunn, Sianadh; Suska, Olga; Cowling, Victoria H
2018-05-01
mRNA cap addition occurs early during RNA Pol II-dependent transcription, facilitating pre-mRNA processing and translation. We report that the mammalian mRNA cap methyltransferase, RNMT-RAM, promotes RNA Pol II transcription independent of mRNA capping and translation. In cells, sublethal suppression of RNMT-RAM reduces RNA Pol II occupancy, net mRNA synthesis, and pre-mRNA levels. Conversely, expression of RNMT-RAM increases transcription independent of cap methyltransferase activity. In isolated nuclei, recombinant RNMT-RAM stimulates transcriptional output; this requires the RAM RNA binding domain. RNMT-RAM interacts with nascent transcripts along their entire length and with transcription-associated factors including the RNA Pol II subunits SPT4, SPT6, and PAFc. Suppression of RNMT-RAM inhibits transcriptional markers including histone H2BK120 ubiquitination, H3K4 and H3K36 methylation, RNA Pol II CTD S5 and S2 phosphorylation, and PAFc recruitment. These findings suggest that multiple interactions among RNMT-RAM, RNA Pol II factors, and RNA along the transcription unit stimulate transcription. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Strong Motion Seismograph Based On MEMS Accelerometer
NASA Astrophysics Data System (ADS)
Teng, Y.; Hu, X.
2013-12-01
The MEMS strong motion seismograph we developed used the modularization method to design its software and hardware.It can fit various needs in different application situation.The hardware of the instrument is composed of a MEMS accelerometer,a control processor system,a data-storage system,a wired real-time data transmission system by IP network,a wireless data transmission module by 3G broadband,a GPS calibration module and power supply system with a large-volumn lithium battery in it. Among it,the seismograph's sensor adopted a three-axis with 14-bit high resolution and digital output MEMS accelerometer.Its noise level just reach about 99μg/√Hz and ×2g to ×8g dynamically selectable full-scale.Its output data rates from 1.56Hz to 800Hz. Its maximum current consumption is merely 165μA,and the device is so small that it is available in a 3mm×3mm×1mm QFN package. Furthermore,there is access to both low pass filtered data as well as high pass filtered data,which minimizes the data analysis required for earthquake signal detection. So,the data post-processing can be simplified. Controlling process system adopts a 32-bit low power consumption embedded ARM9 processor-S3C2440 and is based on the Linux operation system.The processor's operating clock at 400MHz.The controlling system's main memory is a 64MB SDRAM with a 256MB flash-memory.Besides,an external high-capacity SD card data memory can be easily added.So the system can meet the requirements for data acquisition,data processing,data transmission,data storage,and so on. Both wired and wireless network can satisfy remote real-time monitoring, data transmission,system maintenance,status monitoring or updating software.Linux was embedded and multi-layer designed conception was used.The code, including sensor hardware driver,the data acquisition,earthquake setting out and so on,was written on medium layer.The hardware driver consist of IIC-Bus interface driver, IO driver and asynchronous notification driver. The application program layer mainly concludes: earthquake parameter module, local database managing module, data transmission module, remote monitoring, FTP service and so on. The application layer adopted multi-thread process. The whole strong motion seismograph was encapsulated in a small aluminum box, which size is 80mm×120mm×55mm. The inner battery can work continuesly more than 24 hours. The MEMS accelerograph uses modular design for its software part and hardware part. It has remote software update function and can meet the following needs: a) Auto picking up the earthquake event; saving the data on wave-event files and hours files; It may be used for monitoring strong earthquake, explosion, bridge and house health. b) Auto calculate the earthquake parameters, and transferring those parameters by 3G wireless broadband network. This kind of seismograph has characteristics of low cost, easy installation. They can be concentrated in the urban region or areas need to specially care. We can set up a ground motion parameters quick report sensor network while large earthquake break out. Then high-resolution-fine shake-map can be easily produced for the need of emergency rescue. c) By loading P-wave detection program modules, it can be used for earthquake early warning for large earthquakes; d) Can easily construct a high-density layout seismic monitoring network owning remote control and modern intelligent earthquake sensor.
López-Pérez, A; Pérez-Clariget, R
2012-01-15
In this study, we compared pregnancy rates obtained using ram semen stored at 5 °C for 24 h, with ram or bull seminal plasma (SP) added to TRIS-egg yolk extender. During the breeding period, 670 adult Corriedale ewes were cervically inseminated with semen (2 × 10(8) sperm in a volume of 0.2 mL) from eight adult Corriedale rams. Ejaculates, obtained using an artificial vagina, were split into three aliquots and diluted with the following: TRIS-egg yolk based extender (T), T + 30% ram SP (R), or T + 30% bull SP (B). Samples were refrigerated and stored at 5 °C for 24 h until used for AI. Pregnancy was assessed by ultrasonography 35 to 40 d after AI. Pregnancy rate was not affected by ram (P = 0.77) or breeding period (P = 0.43), and there were no interactions between extender and ram (P = 0.94), or extender and breeding period (P = 0.24). However, there was an effect of extender (P = 0.0009) on pregnancy rates; ram SP, but not bull SP, increased pregnancy rates compared with extender without SP (49.7, 38.1, and 31.1%, for R, B, and T respectively). In conclusion, ram SP added to TRIS-egg yolk extender had a beneficial effect on the pregnancy rate of ram sperm stored at 5 °C for 24 h and used for cervical insemination of ewes. Copyright © 2012 Elsevier Inc. All rights reserved.
Research and Applications Modules (RAM). Phase B study: Executive summary
NASA Technical Reports Server (NTRS)
1972-01-01
The design, development, and characteristics of the Research and Applications Module (RAM) system is discussed. The RAM system is a family of payload carriers that can be delivered to and retrieved from low earth orbit by the space shuttle. The RAM payload carriers are used to support diverse technological and scientific investigations. The NASA study objectives, the relationship of the RAM payload carriers to other systems in the orbital space program, and recommendations for additional effort are presented.
Remote Attitude Measurement Sensor (RAMS)
NASA Technical Reports Server (NTRS)
Davis, H. W.
1989-01-01
Remote attitude measurement sensor (RAMS) offers a low-cost, low-risk, proven design concept that is based on mature, demonstrated space sensor technology. The electronic design concepts and interpolation algorithms were tested and proven in space hardware like th Retroreflector Field Tracker and various star trackers. The RAMS concept is versatile and has broad applicability to both ground testing and spacecraft needs. It is ideal for use as a precision laboratory sensor for structural dynamics testing. It requires very little set-up or preparation time and the output data is immediately usable without integration or extensive analysis efforts. For on-orbit use, RAMS rivals any other type of dynamic structural sensor (accelerometer, lidar, photogrammetric techniques, etc.) for overall performance, reliability, suitability, and cost. Widespread acceptance and extensive usage of RAMS will occur only after some interested agency, such as OAST, adopts the RAMS concept and provides the funding support necessary for further development and implementation of RAMS for a specific program.
Nottle, M B; Kleemann, D O; Grosser, T I; Seamark, R F
1997-07-01
A nutritional strategy for increasing ovulation rate in Merino ewes mated in late spring-early summer was evaluated on two commercial farms. The strategy used the 'ram effect' to induce oestrus in seasonally anoestrus ewes and supplementary feeding of lupin grain six days prior to oestrus to increase ovulation rate. Ewes that had been isolated from rams for 6 weeks were exposed to vasectomised rams for 2 weeks and then mated to fertile rams for 6 weeks. Feeding 500 g lupins/head/day for 14 days commencing 12 days after the introduction of vasectomised rams, increased the number of ovulations from 126 to 146 per 100 ewes exposed to rams (P < 0.05). This increase was reflected in an improvement in fecundity (lambs born per ewe lambing; P < 0.05) but not fertility (ewes lambing per ewe mated to rams). Net reproductive performance (the product of fertility, fecundity and lamb survival) was increased by 11 lambs weaned per 100 ewes exposed to rams due to lupin supplementation at mating.
NASA Technical Reports Server (NTRS)
Case, Jonathan
2001-01-01
The Applied Meteorology Unit (AMU) evaluated the Regional Atmospheric Modeling System (RAMS) contained within the Eastern Range Dispersion Assessment System (ERDAS). ERDAS provides emergency response guidance for Cape Canaveral Air Force Station and Kennedy Space Center operations in the event of an accidental hazardous material release or aborted vehicle launch. The RAMS prognostic data are available to ERDAS for display and are used to initialize the 45th Space Wing/Range Safety dispersion model. Thus, the accuracy of the dispersion predictions is dependent upon the accuracy of RAMS forecasts. The RAMS evaluation consisted of an objective and subjective component for the 1999 and 2000 Florida warm seasons, and the 1999-2000 cool season. In the objective evaluation, the AMU generated model error statistics at surface and upper-level observational sites, compared RAMS errors to a coarser RAMS grid configuration, and benchmarked RAMS against the nationally-used Eta model. In the subjective evaluation, the AMU compared forecast cold fronts, low-level temperature inversions, and precipitation to observations during the 1999-2000 cool season, verified the development of the RAMS forecast east coast sea breeze during both warm seasons, and examined the RAMS daily thunderstorm initiation and precipitation patterns during the 2000 warm season. This report summarizes the objective and subjective verification for all three seasons.
Roselli, Charles E.; Meaker, Mary; Stormshak, Fred; Estill, Charles T.
2016-01-01
Testosterone (T) exposure during midgestation differentiates neural circuits controlling sex-specific behaviors and patterns of gonadotropin secretion in male sheep. T acts through androgen receptors (AR) and/or after aromatization to estradiol and binding to estrogen receptors. The current study assessed the role of AR activation in male sexual differentiation. We compared rams that were exposed to the AR antagonist flutamide (Flu) throughout the critical period (i.e. day 30 – 90 of gestation) to control rams and ewes that received no prenatal treatments. The external genitalia of all Flu rams were phenotypically female. Testes were positioned subcutaneously in the inguinal region of the abdomen, exhibited seasonally impaired androgen secretion and were azospermic. Flu rams displayed male-typical precopulatory and mounting behaviors, but could not intromit or ejaculate because they lacked a penis. Flu rams exhibited greater mounting behavior than control rams, and like controls, showed sexual partner preferences for estrous ewes. Neither control nor Flu rams responded to estradiol treatments with displays of female-typical receptive behavior or LH surge responses; whereas all control ewes responded as expected. The ovine sexually dimorphic nucleus in Flu rams was intermediate in volume between control rams and ewes and significantly different from both. . These results indicate that prenatal antiandrogen exposure is not able to block male sexual differentiation in sheep and suggest that compensatory mechanisms intervene to maintain sufficient androgen stimulation during development. PMID:27005749
Kramer, A C; Mirto, A J; Austin, K J; Roselli, C E; Alexander, B M
2017-12-01
Dopamine synthesis in the ventral tegmental area (VTA) is necessary for the reinforcement of sexual behavior. The objective of this study determined if sexual stimuli initiates reward, and whether reward is attenuated in sexually inactive rams. Sexually active rams were exposed to urine from estrous (n=4) or ovariectomized (n=3) ewes with inactive rams (n=3) exposed to urine from estrous ewes. Following exposure, rams were exsanguinated and brains perfused. Alternating sections of the VTA were stained for Fos related antigens (FRA), tyrosine hydroxylase, and dopamine beta-hydroxylase activity. Forebrain tissue, mid-sagittal ventral to the anterior corpus callosum, was stained for dopamine D 2 receptors. Concentrations of cortisol was determined prior to and following exposure. Exposure to ovariectomized-ewe urine in sexually active rams did not influence (P=0.6) FRA expression, but fewer (P<0.05) neurons were positive for tyrosine hydroxylase in the VTA. Sexually inactive rams had fewer (P<0.05) FRA and tyrosine hydroxylase positive neurons in the VTA than sexually active rams following exposure to estrous ewe urine. VTA neurons staining positive for dopamine beta-hydroxylase did not differ by sexual activity (P=0.44) or urine exposure (P=0.07). Exposure to stimulus did not influence (P=0.46) numbers of forebrain neurons staining positive for dopamine D2 receptors in sexually active rams, but fewer (P=0.04) neurons stain positive in inactive rams. Serum concentrations of cortisol did not differ (P≥0.52) among rams prior to or following stimulus. In conclusion sexual inactivity is unlikely due to stress, but may be partially a result of decreased tyrosine hydroxylase and/or the response to dopamine. Copyright © 2017 Elsevier B.V. All rights reserved.
Rich, Mélanie K; Courty, Pierre-Emmanuel; Roux, Christophe; Reinhardt, Didier
2017-08-08
Development of arbuscular mycorrhiza (AM) requires a fundamental reprogramming of root cells for symbiosis. This involves the induction of hundreds of genes in the host. A recently identified GRAS-type transcription factor in Petunia hybrida, ATA/RAM1, is required for the induction of host genes during AM, and for morphogenesis of the fungal endosymbiont. To better understand the role of RAM1 in symbiosis, we set out to identify all genes that depend on activation by RAM1 in mycorrhizal roots. We have carried out a transcript profiling experiment by RNAseq of mycorrhizal plants vs. non-mycorrhizal controls in wild type and ram1 mutants. The results show that the expression of early genes required for AM, such as the strigolactone biosynthetic genes and the common symbiosis signalling genes, is independent of RAM1. In contrast, genes that are involved at later stages of symbiosis, for example for nutrient exchange in cortex cells, require RAM1 for induction. RAM1 itself is highly induced in mycorrhizal roots together with many other transcription factors, in particular GRAS proteins. Since RAM1 has previously been shown to be directly activated by the common symbiosis signalling pathway through CYCLOPS, we conclude that it acts as an early transcriptional switch that induces many AM-related genes, among them genes that are essential for the development of arbuscules, such as STR, STR2, RAM2, and PT4, besides hundreds of additional RAM1-dependent genes the role of which in symbiosis remains to be explored. Taken together, these results indicate that the defect in the morphogenesis of the fungal arbuscules in ram1 mutants may be an indirect consequence of functional defects in the host, which interfere with nutrient exchange and possibly other functions on which the fungus depends.
The entropic cost of quantum generalized measurements
NASA Astrophysics Data System (ADS)
Mancino, Luca; Sbroscia, Marco; Roccia, Emanuele; Gianani, Ilaria; Somma, Fabrizia; Mataloni, Paolo; Paternostro, Mauro; Barbieri, Marco
2018-03-01
Landauer's principle introduces a symmetry between computational and physical processes: erasure of information, a logically irreversible operation, must be underlain by an irreversible transformation dissipating energy. Monitoring micro- and nano-systems needs to enter into the energetic balance of their control; hence, finding the ultimate limits is instrumental to the development of future thermal machines operating at the quantum level. We report on the experimental investigation of a lower bound to the irreversible entropy associated to generalized quantum measurements on a quantum bit. We adopted a quantum photonics gate to implement a device interpolating from the weakly disturbing to the fully invasive and maximally informative regime. Our experiment prompted us to introduce a bound taking into account both the classical result of the measurement and the outcoming quantum state; unlike previous investigation, our entropic bound is based uniquely on measurable quantities. Our results highlight what insights the information-theoretic approach provides on building blocks of quantum information processors.
Maurand, R.; Jehl, X.; Kotekar-Patil, D.; Corna, A.; Bohuslavskyi, H.; Laviéville, R.; Hutin, L.; Barraud, S.; Vinet, M.; Sanquer, M.; De Franceschi, S.
2016-01-01
Silicon, the main constituent of microprocessor chips, is emerging as a promising material for the realization of future quantum processors. Leveraging its well-established complementary metal–oxide–semiconductor (CMOS) technology would be a clear asset to the development of scalable quantum computing architectures and to their co-integration with classical control hardware. Here we report a silicon quantum bit (qubit) device made with an industry-standard fabrication process. The device consists of a two-gate, p-type transistor with an undoped channel. At low temperature, the first gate defines a quantum dot encoding a hole spin qubit, the second one a quantum dot used for the qubit read-out. All electrical, two-axis control of the spin qubit is achieved by applying a phase-tunable microwave modulation to the first gate. The demonstrated qubit functionality in a basic transistor-like device constitutes a promising step towards the elaboration of scalable spin qubit geometries in a readily exploitable CMOS platform. PMID:27882926
NASA Technical Reports Server (NTRS)
Batcher, K. E.; Eddey, E. E.; Faiss, R. O.; Gilmore, P. A.
1981-01-01
The processing of synthetic aperture radar (SAR) signals using the massively parallel processor (MPP) is discussed. The fast Fourier transform convolution procedures employed in the algorithms are described. The MPP architecture comprises an array unit (ARU) which processes arrays of data; an array control unit which controls the operation of the ARU and performs scalar arithmetic; a program and data management unit which controls the flow of data; and a unique staging memory (SM) which buffers and permutes data. The ARU contains a 128 by 128 array of bit-serial processing elements (PE). Two-by-four surarrays of PE's are packaged in a custom VLSI HCMOS chip. The staging memory is a large multidimensional-access memory which buffers and permutes data flowing with the system. Efficient SAR processing is achieved via ARU communication paths and SM data manipulation. Real time processing capability can be realized via a multiple ARU, multiple SM configuration.
Investigation of an advanced fault tolerant integrated avionics system
NASA Technical Reports Server (NTRS)
Dunn, W. R.; Cottrell, D.; Flanders, J.; Javornik, A.; Rusovick, M.
1986-01-01
Presented is an advanced, fault-tolerant multiprocessor avionics architecture as could be employed in an advanced rotorcraft such as LHX. The processor structure is designed to interface with existing digital avionics systems and concepts including the Army Digital Avionics System (ADAS) cockpit/display system, navaid and communications suites, integrated sensing suite, and the Advanced Digital Optical Control System (ADOCS). The report defines mission, maintenance and safety-of-flight reliability goals as might be expected for an operational LHX aircraft. Based on use of a modular, compact (16-bit) microprocessor card family, results of a preliminary study examining simplex, dual and standby-sparing architectures is presented. Given the stated constraints, it is shown that the dual architecture is best suited to meet reliability goals with minimum hardware and software overhead. The report presents hardware and software design considerations for realizing the architecture including redundancy management requirements and techniques as well as verification and validation needs and methods.
Fast Whole-Engine Stirling Analysis
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako
2005-01-01
An experimentally validated approach is described for fast axisymmetric Stirling engine simulations. These simulations include the entire displacer interior and demonstrate it is possible to model a complete engine cycle in less than an hour. The focus of this effort was to demonstrate it is possible to produce useful Stirling engine performance results in a time-frame short enough to impact design decisions. The combination of utilizing the latest 64-bit Opteron computer processors, fiber-optical Myrinet communications, dynamic meshing, and across zone partitioning has enabled solution times at least 240 times faster than previous attempts at simulating the axisymmetric Stirling engine. A comparison of the multidimensional results, calibrated one-dimensional results, and known experimental results is shown. This preliminary comparison demonstrates that axisymmetric simulations can be very accurate, but more work remains to improve the simulations through such means as modifying the thermal equilibrium regenerator models, adding fluid-structure interactions, including radiation effects, and incorporating mechanodynamics.
The Level 0 Pixel Trigger system for the ALICE experiment
NASA Astrophysics Data System (ADS)
Aglieri Rinella, G.; Kluge, A.; Krivda, M.; ALICE Silicon Pixel Detector project
2007-01-01
The ALICE Silicon Pixel Detector contains 1200 readout chips. Fast-OR signals indicate the presence of at least one hit in the 8192 pixel matrix of each chip. The 1200 bits are transmitted every 100 ns on 120 data readout optical links using the G-Link protocol. The Pixel Trigger System extracts and processes them to deliver an input signal to the Level 0 trigger processor targeting a latency of 800 ns. The system is compact, modular and based on FPGA devices. The architecture allows the user to define and implement various trigger algorithms. The system uses advanced 12-channel parallel optical fiber modules operating at 1310 nm as optical receivers and 12 deserializer chips closely packed in small area receiver boards. Alternative solutions with multi-channel G-Link deserializers implemented directly in programmable hardware devices were investigated. The design of the system and the progress of the ALICE Pixel Trigger project are described in this paper.
NASA Astrophysics Data System (ADS)
Maurand, R.; Jehl, X.; Kotekar-Patil, D.; Corna, A.; Bohuslavskyi, H.; Laviéville, R.; Hutin, L.; Barraud, S.; Vinet, M.; Sanquer, M.; de Franceschi, S.
2016-11-01
Silicon, the main constituent of microprocessor chips, is emerging as a promising material for the realization of future quantum processors. Leveraging its well-established complementary metal-oxide-semiconductor (CMOS) technology would be a clear asset to the development of scalable quantum computing architectures and to their co-integration with classical control hardware. Here we report a silicon quantum bit (qubit) device made with an industry-standard fabrication process. The device consists of a two-gate, p-type transistor with an undoped channel. At low temperature, the first gate defines a quantum dot encoding a hole spin qubit, the second one a quantum dot used for the qubit read-out. All electrical, two-axis control of the spin qubit is achieved by applying a phase-tunable microwave modulation to the first gate. The demonstrated qubit functionality in a basic transistor-like device constitutes a promising step towards the elaboration of scalable spin qubit geometries in a readily exploitable CMOS platform.
Direct digital conversion detector technology
NASA Astrophysics Data System (ADS)
Mandl, William J.; Fedors, Richard
1995-06-01
Future imaging sensors for the aerospace and commercial video markets will depend on low cost, high speed analog-to-digital (A/D) conversion to efficiently process optical detector signals. Current A/D methods place a heavy burden on system resources, increase noise, and limit the throughput. This paper describes a unique method for incorporating A/D conversion right on the focal plane array. This concept is based on Sigma-Delta sampling, and makes optimum use of the active detector real estate. Combined with modern digital signal processors, such devices will significantly increase data rates off the focal plane. Early conversion to digital format will also decrease the signal susceptibility to noise, lowering the communications bit error rate. Computer modeling of this concept is described, along with results from several simulation runs. A potential application for direct digital conversion is also reviewed. Future uses for this technology could range from scientific instruments to remote sensors, telecommunications gear, medical diagnostic tools, and consumer products.
A system for routing arbitrary directed graphs on SIMD architectures
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl
1987-01-01
There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal.
NASA Astrophysics Data System (ADS)
Michalak, D. J.; Bruno, A.; Caudillo, R.; Elsherbini, A. A.; Falcon, J. A.; Nam, Y. S.; Poletto, S.; Roberts, J.; Thomas, N. K.; Yoscovits, Z. R.; Dicarlo, L.; Clarke, J. S.
Experimental quantum computing is rapidly approaching the integration of sufficient numbers of quantum bits for interesting applications, but many challenges still remain. These challenges include: realization of an extensible design for large array scale up, sufficient material process control, and discovery of integration schemes compatible with industrial 300 mm fabrication. We present recent developments in extensible circuits with vertical delivery. Toward the goal of developing a high-volume manufacturing process, we will present recent results on a new Josephson junction process that is compatible with current tooling. We will then present the improvements in NbTiN material uniformity that typical 300 mm fabrication tooling can provide. While initial results on few-qubit systems are encouraging, advanced processing control is expected to deliver the improvements in qubit uniformity, coherence time, and control required for larger systems. Research funded by Intel Corporation.
Maurand, R; Jehl, X; Kotekar-Patil, D; Corna, A; Bohuslavskyi, H; Laviéville, R; Hutin, L; Barraud, S; Vinet, M; Sanquer, M; De Franceschi, S
2016-11-24
Silicon, the main constituent of microprocessor chips, is emerging as a promising material for the realization of future quantum processors. Leveraging its well-established complementary metal-oxide-semiconductor (CMOS) technology would be a clear asset to the development of scalable quantum computing architectures and to their co-integration with classical control hardware. Here we report a silicon quantum bit (qubit) device made with an industry-standard fabrication process. The device consists of a two-gate, p-type transistor with an undoped channel. At low temperature, the first gate defines a quantum dot encoding a hole spin qubit, the second one a quantum dot used for the qubit read-out. All electrical, two-axis control of the spin qubit is achieved by applying a phase-tunable microwave modulation to the first gate. The demonstrated qubit functionality in a basic transistor-like device constitutes a promising step towards the elaboration of scalable spin qubit geometries in a readily exploitable CMOS platform.
Fast Whole-Engine Stirling Analysis
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako
2007-01-01
An experimentally validated approach is described for fast axisymmetric Stirling engine simulations. These simulations include the entire displacer interior and demonstrate it is possible to model a complete engine cycle in less than an hour. The focus of this effort was to demonstrate it is possible to produce useful Stirling engine performance results in a time-frame short enough to impact design decisions. The combination of utilizing the latest 64-bit Opteron computer processors, fiber-optical Myrinet communications, dynamic meshing, and across zone partitioning has enabled solution times at least 240 times faster than previous attempts at simulating the axisymmetric Stirling engine. A comparison of the multidimensional results, calibrated one-dimensional results, and known experimental results is shown. This preliminary comparison demonstrates that axisymmetric simulations can be very accurate, but more work remains to improve the simulations through such means as modifying the thermal equilibrium regenerator models, adding fluid-structure interactions, including radiation effects, and incorporating mechanodynamics.
Millisecond timing on PCs and Macs.
MacInnes, W J; Taylor, T L
2001-05-01
A real-time, object-oriented solution for displaying stimuli on Windows 95/98, MacOS and Linux platforms is presented. The program, written in C++, utilizes a special-purpose window class (GLWindow), OpenGL, and 32-bit graphics acceleration; it avoids display timing uncertainty by substituting the new window class for the default window code for each system. We report the outcome of tests for real-time capability across PC and Mac platforms running a variety of operating systems. The test program, which can be used as a shell for programming real-time experiments and testing specific processors, is available at http://www.cs.dal.ca/~macinnwj. We propose to provide researchers with a sense of the usefulness of our program, highlight the ability of many multitasking environments to achieve real time, as well as caution users about systems that may not achieve real time, even under optimal conditions.
Real-time image reconstruction and display system for MRI using a high-speed personal computer.
Haishi, T; Kose, K
1998-09-01
A real-time NMR image reconstruction and display system was developed using a high-speed personal computer and optimized for the 32-bit multitasking Microsoft Windows 95 operating system. The system was operated at various CPU clock frequencies by changing the motherboard clock frequency and the processor/bus frequency ratio. When the Pentium CPU was used at the 200 MHz clock frequency, the reconstruction time for one 128 x 128 pixel image was 48 ms and that for the image display on the enlarged 256 x 256 pixel window was about 8 ms. NMR imaging experiments were performed with three fast imaging sequences (FLASH, multishot EPI, and one-shot EPI) to demonstrate the ability of the real-time system. It was concluded that in most cases, high-speed PC would be the best choice for the image reconstruction and display system for real-time MRI. Copyright 1998 Academic Press.
ERIC Educational Resources Information Center
Peace Corps, Washington, DC. Office of Programming and Training Coordination.
This manual presents a comprehensive training design, suggested procedures, and materials for conducting a workshop in the design, construction, operation, maintenance, and repair of hydrams, and in the planning and implementation of hydram projects. Hydrams (hydraulic rams, hydraulic ram pumps, automatic hydraulic ram pumps, rams) are devices…
ALMA Correlator Real-Time Data Processor
NASA Astrophysics Data System (ADS)
Pisano, J.; Amestica, R.; Perez, J.
2005-10-01
The design of a real-time Linux application utilizing Real-Time Application Interface (RTAI) to process real-time data from the radio astronomy correlator for the Atacama Large Millimeter Array (ALMA) is described. The correlator is a custom-built digital signal processor which computes the cross-correlation function of two digitized signal streams. ALMA will have 64 antennas with 2080 signal streams each with a sample rate of 4 giga-samples per second. The correlator's aggregate data output will be 1 gigabyte per second. The software is defined by hard deadlines with high input and processing data rates, while requiring interfaces to non real-time external computers. The designed computer system - the Correlator Data Processor or CDP, consists of a cluster of 17 SMP computers, 16 of which are compute nodes plus a master controller node all running real-time Linux kernels. Each compute node uses an RTAI kernel module to interface to a 32-bit parallel interface which accepts raw data at 64 megabytes per second in 1 megabyte chunks every 16 milliseconds. These data are transferred to tasks running on multiple CPUs in hard real-time using RTAI's LXRT facility to perform quantization corrections, data windowing, FFTs, and phase corrections for a processing rate of approximately 1 GFLOPS. Highly accurate timing signals are distributed to all seventeen computer nodes in order to synchronize them to other time-dependent devices in the observatory array. RTAI kernel tasks interface to the timing signals providing sub-millisecond timing resolution. The CDP interfaces, via the master node, to other computer systems on an external intra-net for command and control, data storage, and further data (image) processing. The master node accesses these external systems utilizing ALMA Common Software (ACS), a CORBA-based client-server software infrastructure providing logging, monitoring, data delivery, and intra-computer function invocation. The software is being developed in tandem with the correlator hardware which presents software engineering challenges as the hardware evolves. The current status of this project and future goals are also presented.
NASA Technical Reports Server (NTRS)
Simon, C. G.; Buonaquisti, A. J.; Batchelor, D. A.; Hunter, J. L.; Griffis, D. P.; Misra, V.; Ricks, D. R.; Wortman, J. J.; Brownlee, D. E.; Best, S. R.
1995-01-01
Two dimensional elemental ion maps have been recorded for hundreds of microparticle impact sites and contamination features on LDEF surfaces. Since the majority of the analyzed surfaces were metal-oxide-silicon (MOS) impact detectors from the Interplanetary Dust Experiment, a series of 'standard' and 'blank' analyses of these surfaces are included. Hypervelocity impacts of forsterite olivine microparticles on activated flight sensors served as standards while stylus and pulsed laser simulated 'impacts' served as analytical blanks. Results showed that despite serious contamination issues, impactor residues can be identified in greater than 1/3 of the impact sites. While aluminum oxide particles could not be detected on aluminum surfaces, they were detected on germanium surfaces from row 12. Remnants of manmade debris impactors consisting of paint chips and bits of metal were identified on surfaces from LDEF Rows 3 (west or trailing side), 6 (south), 9 (ram or leading side), 12 (north) and the space end. Higher than expected ratios of manmade microparticle impacts to total microparticle impacts were found on the space end and the trailing side. These results were consistent with time-tagged and time-segregated microparticle impact data from the IDE and other LDEF experiments. A myriad of contamination interferences were identified and their effects on impactor debris identification mitigated during the course of this study. These interferences include pre-, post and inflight deposited surface contaminants as well as indigenous heterogeneous material contaminants. Non-flight contaminations traced to human origins, including spittle and skin oils, contributed significant levels of alkali-rich carbonaceous interferences. A ubiquitous layer of in-flight deposited silicaceous contamination varied in thickness with location on LDEF, even on a micro scale. In-flight deposited (low velocity) contaminants include urine droplets and bits of metal film from eroded thermal blankets.
Tethered Forth system for FPGA applications
NASA Astrophysics Data System (ADS)
Goździkowski, Paweł; Zabołotny, Wojciech M.
2013-10-01
This paper presents the tethered Forth system dedicated for testing and debugging of FPGA based electronic systems. Use of the Forth language allows to interactively develop and run complex testing or debugging routines. The solution is based on a small, 16-bit soft core CPU, used to implement the Forth Virtual Machine. Thanks to the use of the tethered Forth model it is possible to minimize usage of the internal RAM memory in the FPGA. The function of the intelligent terminal, which is an essential part of the tethered Forth system, may be fulfilled by the standard PC computer or by the smartphone. System is implemented in Python (the software for intelligent terminal), and in VHDL (the IP core for FPGA), so it can be easily ported to different hardware platforms. The connection between the terminal and FPGA may be established and disconnected many times without disturbing the state of the FPGA based system. The presented system has been verified in the hardware, and may be used as a tool for debugging, testing and even implementing of control algorithms for FPGA based systems.
The Solid State Image Sensor's Contribution To The Development Of Silicon Technology
NASA Astrophysics Data System (ADS)
Weckler, Gene P.
1985-12-01
Until recently, a solid-state image sensor with full television resolution was a dream. However, the dream of a solid state image sensor has been a driving force in the development of silicon technology for more than twenty-five years. There are probably many in the main stream of semiconductor technology who would argue with this; however, the solid state image sensor was conceived years before the invention of the semi conductor RAM or the microprocessor (i.e., even before the invention of the integrated circuit). No other potential application envisioned at that time required such complexity. How could anyone have ever hoped in 1960 to make a semi conductor chip containing half-a-million picture elements, capable of resolving eight to twelve bits of infornation, and each capable of readout rates in the tens of mega-pixels per second? As early as 1960 arrays of p-n junctions were being investigated as the optical targets in vidicon tubes, replacing the photoconductive targets. It took silicon technology several years to catch up with these dreamers.
iHand: an interactive bare-hand-based augmented reality interface on commercial mobile phones
NASA Astrophysics Data System (ADS)
Choi, Junyeong; Park, Jungsik; Park, Hanhoon; Park, Jong-Il
2013-02-01
The performance of mobile phones has rapidly improved, and they are emerging as a powerful platform. In many vision-based applications, human hands play a key role in natural interaction. However, relatively little attention has been paid to the interaction between human hands and the mobile phone. Thus, we propose a vision- and hand gesture-based interface in which the user holds a mobile phone in one hand but sees the other hand's palm through a built-in camera. The virtual contents are faithfully rendered on the user's palm through palm pose estimation, and reaction with hand and finger movements is achieved that is recognized by hand shape recognition. Since the proposed interface is based on hand gestures familiar to humans and does not require any additional sensors or markers, the user can freely interact with virtual contents anytime and anywhere without any training. We demonstrate that the proposed interface works at over 15 fps on a commercial mobile phone with a 1.2-GHz dual core processor and 1 GB RAM.
The Spherical Tokamak MEDUSA for Mexico
NASA Astrophysics Data System (ADS)
Ribeiro, C.; Salvador, M.; Gonzalez, J.; Munoz, O.; Tapia, A.; Arredondo, V.; Chavez, R.; Nieto, A.; Gonzalez, J.; Garza, A.; Estrada, I.; Jasso, E.; Acosta, C.; Briones, C.; Cavazos, G.; Martinez, J.; Morones, J.; Almaguer, J.; Fonck, R.
2011-10-01
The former spherical tokamak MEDUSA (Madison EDUcation Small Aspect.ratio tokamak, R < 0.14m, a < 0.10m, BT < 0.5T, Ip < 40kA, 3ms pulse) is currently being recomissioned at the Universidad Autónoma de Nuevo León, Mexico, as part of an agreement between the Faculties of Mech.-Elect. Eng. and Phy. Sci.-Maths. The main objective for having MEDUSA is to train students in plasma physics & technical related issues, aiming a full design of a medium size device (e.g. Tokamak-T). Details of technical modifications and a preliminary scientific programme will be presented. MEDUSA-MX will also benefit any developments in the existing Mexican Fusion Network. Strong liaison within national and international plasma physics communities is expected. New activities on plasma & engineering modeling are expected to be developed in parallel by using the existing facilities such as a multi-platform computer (Silicon Graphics Altix XE250, 128G RAM, 3.7TB HD, 2.7GHz, quad-core processor), ancillary graph system (NVIDIA Quadro FE 2000/1GB GDDR-5 PCI X16 128, 3.2GHz), and COMSOL Multiphysics-Solid Works programs.
NASA Astrophysics Data System (ADS)
Caballero Bendixsen, Luis; Bott-Suzuki, Simon; Cordaro, Samuel; Krishnan, Mahadevan; Chapman, Stephen; Coleman, Phil; Chittenden, Jeremy
2015-11-01
Results will be shown on coordinated experiments and MHD simulations on magnetically driven implosions, with an emphasis on current diffusion and heat transport. Experiments are run at a Mather-type dense plasma focus (DPF-3, Vc: 20 kV, Ip: 480 kA, E: 5.8 kJ). Typical experiments are run at 300 kA and 0.33 Hz repetition rate with different gas loads (Ar, Ne, and He) at pressures of ~ 1-3 Torr, usually gathering 1000 shots per day. Simulations are run at a 96-core HP blade server cluster using 3GHz processors with 4GB RAM per node.Preliminary results show axial and radial phase plasma sheath velocity of ~ 1x105 m/s. These are in agreement with the snow-plough model of DPFs. Peak magnetic field of ~ 1 Tesla in the radial compression phase are measured. Electron densities on the order of 1018 cm-3 anticipated. Comparison between 2D and 3D models with empirical results show a good agreement in the axial and radial phase.
Evaluation of candidate working fluid formulations for the electrothermal-chemical wind tunnel
NASA Technical Reports Server (NTRS)
Akyurtlu, Jale F.; Akyurtlu, Ates
1993-01-01
A new hypersonic test facility which can simulate conditions typical of atmospheric flight at Mach numbers up to 20 is currently under study at the NASA/LaRC Hypersonic Propulsion Branch. In the proposed research, it was suggested that a combustion augmented electrothermal wind tunnel concept may be applied to the planned hypersonic testing facility. The purpose of the current investigation is to evaluate some candidate working fluid formulations which may be used in the chemical-electrothermal wind. The efforts in the initial phase of this research were concentrated on acquiring the code used by GASL to model the electrothermal wind tunnel and testing it using the conditions of GASL simulation. The early version of the general chemical kinetics code (GCKP84) was obtained from NASA and the latest updated version of the code (LSENS) was obtained from the author Dr. Bittker. Both codes are installed on a personal computer with a 486 25 MHz processor and 16 Mbyte RAM. Since the available memory was not sufficient to debug LSENS, for the current work GCKP84 was used.
The ovine sexually dimorphic nucleus, aromatase, and sexual partner preferences in sheep.
Roselli, C E; Stormshak, F
2010-02-28
We are using the domestic ram as an experimental model to examine the role of aromatase in the development of sexual partner preferences. This interest has arisen because of the observation that as many as 8% of domestic rams are sexually attracted to other rams (male-oriented) in contrast to the majority of rams that are attracted to estrous ewes (female-oriented). Our findings demonstrate that aromatase expression is enriched in a cluster of neurons in the medial preoptic nucleus called the ovine sexually dimorphic nucleus (oSDN). The size of the oSDN is associated with a ram's sexual partner preference, such that the nucleus is 2-3 times larger in rams that are attracted to females (female-oriented) than in rams that are attracted to other rams (male-oriented). Moreover, the volume of the oSDN in male-oriented rams is similar to the volume in ewes. These volume differences are not influenced by adult concentrations of serum testosterone. Instead, we found that the oSDN is already present in late gestation lamb fetuses (approximately day 135 of gestation) when it is approximately 2-fold greater in males than in females. Exposure of genetic female fetuses to exogenous testosterone during the critical period for sexual differentiation masculinizes oSDN volume and aromatase expression when examined subsequently on day 135. The demonstration that the oSDN is organized prenatally by testosterone exposure suggests that the brain of the male-oriented ram may be under-androgenized during development. Copyright 2009 Elsevier Ltd. All rights reserved.
Bialek-Davenet, Suzanne; Marcon, Estelle; Leflon-Guibout, Véronique; Lavigne, Jean-Philippe; Bert, Frédéric; Moreau, Richard; Nicolas-Chanoine, Marie-Hélène
2011-01-01
The relationship between efflux system overexpression and cross-resistance to cefoxitin, quinolones, and chloramphenicol has recently been reported in Klebsiella pneumoniae. In 3 previously published clinical isolates and 17 in vitro mutants selected with cefoxitin or fluoroquinolones, mutations in the potential regulator genes of the AcrAB efflux pump (acrR, ramR, ramA, marR, marA, soxR, soxS, and rob) were searched, and their impacts on efflux-related antibiotic cross-resistance were assessed. All mutants but 1, and 2 clinical isolates, overexpressed acrB. No mutation was detected in the regulator genes studied among the clinical isolates and 8 of the mutants. For the 9 remaining mutants, a mutation was found in the ramR gene in 8 of them and in the soxR gene in the last one, resulting in overexpression of ramA and soxS, respectively. Transformation of the ramR mutants and the soxR mutant with the wild-type ramR and soxR genes, respectively, abolished overexpression of acrB and ramA in the ramR mutants and of soxS in the soxR mutant, as well as antibiotic cross-resistance. Resistance due to efflux system overexpression was demonstrated for 4 new antibiotics: cefuroxime, cefotaxime, ceftazidime, and ertapenem. This study shows that the ramR and soxR genes control the expression of efflux systems in K. pneumoniae and suggests the existence of efflux pumps other than AcrAB and of other loci involved in the regulation of AcrAB expression. PMID:21464248
Kumar, Davendra; Joshi, Anil; Naqvi, S M K
2010-04-01
A study was conducted to evaluate the semen production and sperm motion characteristics of ram lambs by computer-aided semen analysis technique. Eight Malpura rams were raised under intensive management system and were trained for semen collection at a weekly interval from the age of 6 months. Rams were scheduled for semen collection at a weekly interval up to 1 year of age to assess their potential for semen production and objective evaluation of semen quality. The average age of ram lambs at the time of first ejaculation was 219 days ranging from 186 to 245 days. The age of ram lambs significantly (p < 0.05) influenced sperm concentration, sperm velocities, and beat frequency of spermatozoa, which were higher in 9-12-month-old compared to 6-9-month-old ram lambs. However, the effect of age was not significant on semen volume, percent motility, percent rapid, medium or slow motile spermatozoa, percent linearity, percent straightness, amplitude of lateral head displacement, percent elongation, and area of sperm head. The body weight of ram lambs was significantly (p < 0.01) and positively correlated (r = 0.46) with age. The results indicate that Malpura ram lambs of 9-12 months of age raised under the intensive management system in a semiarid tropical environment can produce good quality of semen.
Intrinsic Hydrophobicity of Rammed Earth
NASA Astrophysics Data System (ADS)
Holub, M.; Stone, C.; Balintova, M.; Grul, R.
2015-11-01
Rammed earth is well known for its vapour diffusion properties, its ability to regulate humidity within the built environment. Rammed earth is also an aesthetically iconic material such as marble or granite and therefore is preferably left exposed. However exposed rammed earth is often coated with silane/siloxane water repellents or the structure is modified architecturally (large roof overhangs) to accommodate for the hydrophilic nature of the material. This paper sets out to find out optimal hydrophobicity for rammed earth based on natural composite fibres and surface coating without adversely affecting the vapour diffusivity of the material. The material is not required to be waterproof, but should resist at least driving rain. In order to evaluate different approaches to increase hydrophobicity of rammed earth surface, peat fibres and four types of repellents were used.