SPROC: A multiple-processor DSP IC
NASA Technical Reports Server (NTRS)
Davis, R.
1991-01-01
A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.
Special-purpose computing for dense stellar systems
NASA Astrophysics Data System (ADS)
Makino, Junichiro
2007-08-01
I'll describe the current status of the GRAPE-DR project. The GRAPE-DR is the next-generation hardware for N-body simulation. Unlike the previous GRAPE hardwares, it is programmable SIMD machine with a large number of simple processors integrated into a single chip. The GRAPE-DR chip consists of 512 simple processors and operates at the clock speed of 500 MHz, delivering the theoretical peak speed of 512/226 Gflops (single/double precision). As of August 2006, the first prototype board with the sample chip successfully passed the test we prepared. The full GRAPE-DR system will consist of 4096 chips, reaching the theoretical peak speed of 2 Pflops.
Architectures for single-chip image computing
NASA Astrophysics Data System (ADS)
Gove, Robert J.
1992-04-01
This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang Meizhen; Shi Longzhao; Wang Yuxing
2006-08-15
An inherently nonlinear relation between the output current of the tetralateral position sensitive detector (PSD) and the position of the incident light spot has been found theoretically. Based on single-chip microcomputer and the theoretical relation between output current and position, a new signal processor capable of correcting nonlinearity and reducing position measurement deviation of tetralateral PSD was developed. A tetralateral PSD (S1200, 13x13 mm{sup 2}, Hamamatsu Photonics K.K.) was measured with the new signal processor, a linear relation between the output position of the PSD, and the incident position of the light spot was obtained. In the 60% range ofmore » a 13x13 mm{sup 2} active area, the position nonlinearity (rms) was 0.15% and the position measurement deviation (rms) was {+-}20 {mu}m. Compared with traditional analog signal processor, the new signal processor is of better compatibility, lower cost, higher precision, and easier to be interfaced.« less
NASA Astrophysics Data System (ADS)
Huang, Mei-Zhen; Shi, Long-Zhao; Wang, Yu-Xing; Ni, Yi; Li, Zhen-Qing; Ding, Hai-Feng
2006-08-01
An inherently nonlinear relation between the output current of the tetralateral position sensitive detector (PSD) and the position of the incident light spot has been found theoretically. Based on single-chip microcomputer and the theoretical relation between output current and position, a new signal processor capable of correcting nonlinearity and reducing position measurement deviation of tetralateral PSD was developed. A tetralateral PSD (S1200, 13×13mm2, Hamamatsu Photonics K.K.) was measured with the new signal processor, a linear relation between the output position of the PSD, and the incident position of the light spot was obtained. In the 60% range of a 13×13mm2 active area, the position nonlinearity (rms) was 0.15% and the position measurement deviation (rms) was ±20μm. Compared with traditional analog signal processor, the new signal processor is of better compatibility, lower cost, higher precision, and easier to be interfaced.
Application of a VLSI vector quantization processor to real-time speech coding
NASA Technical Reports Server (NTRS)
Davidson, G.; Gersho, A.
1986-01-01
Attention is given to a working vector quantization processor for speech coding that is based on a first-generation VLSI chip which efficiently performs the pattern-matching operation needed for the codebook search process (CPS). Using this chip, the CPS architecture has been successfully incorporated into a compact, single-board Vector PCM implementation operating at 7-18 kbits/sec. A real time Adaptive Vector Predictive Coder system using the CPS has also been implemented.
The design of an adaptive predictive coder using a single-chip digital signal processor
NASA Astrophysics Data System (ADS)
Randolph, M. A.
1985-01-01
A speech coding processor architecture design study has been performed in which Texas Instruments TMS32010 has been selected from among three commercially available digital signal processing integrated circuits and evaluated in an implementation study of real-time Adaptive Predictive Coding (APC). The TMS32010 has been compared with AR&T Bell Laboratories DSP I and Nippon Electric Co. PD7720 and was found to be most suitable for a single chip implementation of APC. A preliminary design system based on TMS32010 has been performed, and several of the hardware and software design issues are discussed. Particular attention was paid to the design of an external memory controller which permits rapid sequential access of external RAM. As a result, it has been determined that a compact hardware implementation of the APC algorithm is feasible based of the TSM32010. Originator-supplied keywords include: vocoders, speech compression, adaptive predictive coding, digital signal processing microcomputers, speech processor architectures, and special purpose processor.
Rectangular Array Of Digital Processors For Planning Paths
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.; Fossum, Eric R.; Nixon, Robert H.
1993-01-01
Prototype 24 x 25 rectangular array of asynchronous parallel digital processors rapidly finds best path across two-dimensional field, which could be patch of terrain traversed by robotic or military vehicle. Implemented as single-chip very-large-scale integrated circuit. Excepting processors on edges, each processor communicates with four nearest neighbors along paths representing travel to north, south, east, and west. Each processor contains delay generator in form of 8-bit ripple counter, preset to 1 of 256 possible values. Operation begins with choice of processor representing starting point. Transmits signals to nearest neighbor processors, which retransmits to other neighboring processors, and process repeats until signals propagated across entire field.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Patrick
2014-01-31
The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
Color sensor and neural processor on one chip
NASA Astrophysics Data System (ADS)
Fiesler, Emile; Campbell, Shannon R.; Kempem, Lother; Duong, Tuan A.
1998-10-01
Low-cost, compact, and robust color sensor that can operate in real-time under various environmental conditions can benefit many applications, including quality control, chemical sensing, food production, medical diagnostics, energy conservation, monitoring of hazardous waste, and recycling. Unfortunately, existing color sensor are either bulky and expensive or do not provide the required speed and accuracy. In this publication we describe the design of an accurate real-time color classification sensor, together with preprocessing and a subsequent neural network processor integrated on a single complementary metal oxide semiconductor (CMOS) integrated circuit. This one-chip sensor and information processor will be low in cost, robust, and mass-producible using standard commercial CMOS processes. The performance of the chip and the feasibility of its manufacturing is proven through computer simulations based on CMOS hardware parameters. Comparisons with competing methodologies show a significantly higher performance for our device.
NASA Astrophysics Data System (ADS)
Hayakawa, Hitoshi; Ogawa, Makoto; Shibata, Tadashi
2005-04-01
A very large scale integrated circuit (VLSI) architecture for a multiple-instruction-stream multiple-data-stream (MIMD) associative processor has been proposed. The processor employs an architecture that enables seamless switching from associative operations to arithmetic operations. The MIMD element is convertible to a regular central processing unit (CPU) while maintaining its high performance as an associative processor. Therefore, the MIMD associative processor can perform not only on-chip perception, i.e., searching for the vector most similar to an input vector throughout the on-chip cache memory, but also arithmetic and logic operations similar to those in ordinary CPUs, both simultaneously in parallel processing. Three key technologies have been developed to generate the MIMD element: associative-operation-and-arithmetic-operation switchable calculation units, a versatile register control scheme within the MIMD element for flexible operations, and a short instruction set for minimizing the memory size for program storage. Key circuit blocks were designed and fabricated using 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. As a result, the full-featured MIMD element is estimated to be 3 mm2, showing the feasibility of an 8-parallel-MIMD-element associative processor in a single chip of 5 mm× 5 mm.
Jiang, Chao; Zhang, Hongyan; Wang, Jia; Wang, Yaru; He, Heng; Liu, Rui; Zhou, Fangyuan; Deng, Jialiang; Li, Pengcheng; Luo, Qingming
2011-11-01
Laser speckle imaging (LSI) is a noninvasive and full-field optical imaging technique which produces two-dimensional blood flow maps of tissues from the raw laser speckle images captured by a CCD camera without scanning. We present a hardware-friendly algorithm for the real-time processing of laser speckle imaging. The algorithm is developed and optimized specifically for LSI processing in the field programmable gate array (FPGA). Based on this algorithm, we designed a dedicated hardware processor for real-time LSI in FPGA. The pipeline processing scheme and parallel computing architecture are introduced into the design of this LSI hardware processor. When the LSI hardware processor is implemented in the FPGA running at the maximum frequency of 130 MHz, up to 85 raw images with the resolution of 640×480 pixels can be processed per second. Meanwhile, we also present a system on chip (SOC) solution for LSI processing by integrating the CCD controller, memory controller, LSI hardware processor, and LCD display controller into a single FPGA chip. This SOC solution also can be used to produce an application specific integrated circuit for LSI processing.
Towards a Generic and Adaptive System-On-Chip Controller for Space Exploration Instrumentation
NASA Technical Reports Server (NTRS)
Iturbe, Xabier; Keymeulen, Didier; Yiu, Patrick; Berisford, Dan; Hand, Kevin; Carlson, Robert; Ozer, Emre
2015-01-01
This paper introduces one of the first efforts conducted at NASA’s Jet Propulsion Laboratory (JPL) to develop a generic System-on-Chip (SoC) platform to control science instruments that are proposed for future NASA missions. The SoC platform is named APEX-SoC, where APEX stands for Advanced Processor for space Exploration, and is based on a hybrid Xilinx Zynq that combines an FPGA and an ARM Cortex-A9 dual-core processor on a single chip. The Zynq implements a generic and customizable on-chip infrastructure that can be reused with a variety of instruments, and it has been coupled with a set of off-chip components that are necessary to deal with the different instruments. We have taken JPL’s Compositional InfraRed Imaging Spectrometer (CIRIS), which is proposed for NASA icy moons missions, as a use-case scenario to demonstrate that the entire data processing, control and interface of an instrument can be implemented on a single device using the on-chip infrastructure described in this paper. We show that the performance results achieved in this preliminary version of the instrumentation controller are sufficient to fulfill the science requirements demanded to the CIRIS instrument in future NASA missions, such as Europa.
Testing and operating a multiprocessor chip with processor redundancy
Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J
2014-10-21
A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
Baseband processor development for the Advanced Communications Satellite Program
NASA Technical Reports Server (NTRS)
Moat, D.; Sabourin, D.; Stilwell, J.; Mccallister, R.; Borota, M.
1982-01-01
An onboard-baseband-processor concept for a satellite-switched time-division-multiple-access (SS-TDMA) communication system was developed for NASA Lewis Research Center. The baseband processor routes and controls traffic on an individual message basis while providing significant advantages in improved link margins and system flexibility. Key technology developments required to prove the flight readiness of the baseband-processor design are being verified in a baseband-processor proof-of-concept model. These technology developments include serial MSK modems, Clos-type baseband routing switch, a single-chip CMOS maximum-likelihood convolutional decoder, and custom LSL implementation of high-speed, low-power ECL building blocks.
Implementation of kernels on the Maestro processor
NASA Astrophysics Data System (ADS)
Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.
Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.
A Single-Chip CMOS Pulse Oximeter with On-Chip Lock-In Detection.
He, Diwei; Morgan, Stephen P; Trachanis, Dimitrios; van Hese, Jan; Drogoudis, Dimitris; Fummi, Franco; Stefanni, Francesco; Guarnieri, Valerio; Hayes-Gill, Barrie R
2015-07-14
Pulse oximetry is a noninvasive and continuous method for monitoring the blood oxygen saturation level. This paper presents the design and testing of a single-chip pulse oximeter fabricated in a 0.35 µm CMOS process. The chip includes photodiode, transimpedance amplifier, analogue band-pass filters, analogue-to-digital converters, digital signal processor and LED timing control. The experimentally measured AC and DC characteristics of individual circuits including the DC output voltage of the transimpedance amplifier, transimpedance gain of the transimpedance amplifier, and the central frequency and bandwidth of the analogue band-pass filters, show a good match (within 1%) with the circuit simulations. With modulated light source and integrated lock-in detection the sensor effectively suppresses the interference from ambient light and 1/f noise. In a breath hold and release experiment the single chip sensor demonstrates consistent and comparable performance to commercial pulse oximetry devices with a mean of 1.2% difference. The single-chip sensor enables a compact and robust design solution that offers a route towards wearable devices for health monitoring.
A Single-Chip CMOS Pulse Oximeter with On-Chip Lock-In Detection
He, Diwei; Morgan, Stephen P.; Trachanis, Dimitrios; van Hese, Jan; Drogoudis, Dimitris; Fummi, Franco; Stefanni, Francesco; Guarnieri, Valerio; Hayes-Gill, Barrie R.
2015-01-01
Pulse oximetry is a noninvasive and continuous method for monitoring the blood oxygen saturation level. This paper presents the design and testing of a single-chip pulse oximeter fabricated in a 0.35 µm CMOS process. The chip includes photodiode, transimpedance amplifier, analogue band-pass filters, analogue-to-digital converters, digital signal processor and LED timing control. The experimentally measured AC and DC characteristics of individual circuits including the DC output voltage of the transimpedance amplifier, transimpedance gain of the transimpedance amplifier, and the central frequency and bandwidth of the analogue band-pass filters, show a good match (within 1%) with the circuit simulations. With modulated light source and integrated lock-in detection the sensor effectively suppresses the interference from ambient light and 1/f noise. In a breath hold and release experiment the single chip sensor demonstrates consistent and comparable performance to commercial pulse oximetry devices with a mean of 1.2% difference. The single-chip sensor enables a compact and robust design solution that offers a route towards wearable devices for health monitoring. PMID:26184225
Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP
NASA Astrophysics Data System (ADS)
Brooks, Geoffrey W.
1996-03-01
Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.
Picoradio: Communication/Computation Piconodes for Sensor Networks
2003-01-02
diagram of PicoNode III, or Quark node. It is made from two custom chips, Strange RF and Charm digital processor , and is complemented by a set of...the chipset comprising of Strange (analog OOK transceiver) and Charm (digital processor ) chips. 44 Figure 33: System block diagram of the Quark node...19 2.B PICONODE II - TWO-CHIP PICONODE IMPLEMENTATION ......................................... 21 2.B.1 Baseband processor (BBP
Formal design specification of a Processor Interface Unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1992-01-01
This report describes work to formally specify the requirements and design of a processor interface unit (PIU), a single-chip subsystem providing memory-interface bus-interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. The need for high-quality design assurance in such applications is an undisputed fact, given the disastrous consequences that even a single design flaw can produce. Thus, the further development and application of formal methods to fault-tolerant systems is of critical importance as these systems see increasing use in modern society.
Quantum interference in heterogeneous superconducting-photonic circuits on a silicon chip.
Schuck, C; Guo, X; Fan, L; Ma, X; Poot, M; Tang, H X
2016-01-21
Quantum information processing holds great promise for communicating and computing data efficiently. However, scaling current photonic implementation approaches to larger system size remains an outstanding challenge for realizing disruptive quantum technology. Two main ingredients of quantum information processors are quantum interference and single-photon detectors. Here we develop a hybrid superconducting-photonic circuit system to show how these elements can be combined in a scalable fashion on a silicon chip. We demonstrate the suitability of this approach for integrated quantum optics by interfering and detecting photon pairs directly on the chip with waveguide-coupled single-photon detectors. Using a directional coupler implemented with silicon nitride nanophotonic waveguides, we observe 97% interference visibility when measuring photon statistics with two monolithically integrated superconducting single-photon detectors. The photonic circuit and detector fabrication processes are compatible with standard semiconductor thin-film technology, making it possible to implement more complex and larger scale quantum photonic circuits on silicon chips.
Design and qualification of the SEU/TD Radiation Monitor chip
NASA Technical Reports Server (NTRS)
Buehler, Martin G.; Blaes, Brent R.; Soli, George A.; Zamani, Nasser; Hicks, Kenneth A.
1992-01-01
This report describes the design, fabrication, and testing of the Single-Event Upset/Total Dose (SEU/TD) Radiation Monitor chip. The Radiation Monitor is scheduled to fly on the Mid-Course Space Experiment Satellite (MSX). The Radiation Monitor chip consists of a custom-designed 4-bit SRAM for heavy ion detection and three MOSFET's for monitoring total dose. In addition the Radiation Monitor chip was tested along with three diagnostic chips: the processor monitor and the reliability and fault chips. These chips revealed the quality of the CMOS fabrication process. The SEU/TD Radiation Monitor chip had an initial functional yield of 94.6 percent. Forty-three (43) SEU SRAM's and 14 Total Dose MOSFET's passed the hermeticity and final electrical tests and were delivered to LL.
An on-chip coupled resonator optical waveguide single-photon buffer
Takesue, Hiroki; Matsuda, Nobuyuki; Kuramochi, Eiichi; Munro, William J.; Notomi, Masaya
2013-01-01
Integrated quantum optical circuits are now seen as one of the most promising approaches with which to realize single-photon quantum information processing. Many of the core elements for such circuits have been realized, including sources, gates and detectors. However, a significant missing function necessary for photonic quantum information processing on-chip is a buffer, where single photons are stored for a short period of time to facilitate circuit synchronization. Here we report an on-chip single-photon buffer based on coupled resonator optical waveguides (CROW) consisting of 400 high-Q photonic crystal line-defect nanocavities. By using the CROW, a pulsed single photon is successfully buffered for 150 ps with 50-ps tunability while maintaining its non-classical properties. Furthermore, we show that our buffer preserves entanglement by storing and retrieving one photon from a time-bin entangled state. This is a significant step towards an all-optical integrated quantum information processor. PMID:24217422
Multicore: Fallout from a Computing Evolution
Yelick, Kathy [Director, NERSC
2017-12-09
July 22, 2008 Berkeley Lab lecture: Parallel computing used to be reserved for big science and engineering projects, but in two years that's all changed. Even laptops and hand-helds use parallel processors. Unfortunately, the software hasn't kept pace. Kathy Yelick, Director of the National Energy Research Scientific Computing Center at Berkeley Lab, describes the resulting chaos and the computing community's efforts to develop exciting applications that take advantage of tens or hundreds of processors on a single chip.
Methods for Trustworthy Design of On-Chip Bus Interconnect for General-Purpose Processors
2012-03-01
Technology Andrew Huang, was able to test the security properties of HyperTransport bus protocol on an Xbox [20]. In his research, he was able to...TRUSTWORTHY DESIGN OF ON -CHIP BUS INTERCONNECT FOR GENERAL-PURPOSE PROCESSORS by Jay F. Elson March 2012 Thesis Advisor: Ted Huffmire Second...AND DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE Methods for Trustworthy Design of On -Chip Bus Interconnect for General-Purpose Processors 5
Superconducting Qubit with Integrated Single Flux Quantum Controller Part I: Theory and Fabrication
NASA Astrophysics Data System (ADS)
Beck, Matthew; Leonard, Edward, Jr.; Thorbeck, Ted; Zhu, Shaojiang; Howington, Caleb; Nelson, Jj; Plourde, Britton; McDermott, Robert
As the size of quantum processors grow, so do the classical control requirements. The single flux quantum (SFQ) Josephson digital logic family offers an attractive route to proximal classical control of multi-qubit processors. Here we describe coherent control of qubits via trains of SFQ pulses. We discuss the fabrication of an SFQ-based pulse generator and a superconducting transmon qubit on a single chip. Sources of excess microwave loss stemming from the complex multilayer fabrication of the SFQ circuit are discussed. We show how to mitigate this loss through judicious choice of process workflow and appropriate use of sacrificial protection layers. Present address: IBM T.J. Watson Research Center.
Bottom-up construction of artificial molecules for superconducting quantum processors
NASA Astrophysics Data System (ADS)
Poletto, Stefano; Rigetti, Chad; Gambetta, Jay M.; Merkel, Seth; Chow, Jerry M.; Corcoles, Antonio D.; Smolin, John A.; Rozen, Jim R.; Keefe, George A.; Rothwell, Mary B.; Ketchen, Mark B.; Steffen, Matthias
2012-02-01
Recent experiments on transmon qubits capacitively coupled to superconducting 3-dimensional cavities have shown coherence times much longer than transmons coupled to more traditional planar resonators. For the implementation of a quantum processor this approach has clear advantages over traditional techniques but it poses the challenge of scalability. We are currently implementing multi-qubits experiments based on a bottom-up scaling approach. First, transmon qubits are fabricated on individual chips and are independently characterized. Second, an artificial molecule is assembled by selecting a particular set of previously characterized single-transmon chips. We present recent data on a two-qubit artificial molecule constructed in this way. The two qubits are chosen to generate a strong Z-Z interaction by matching the 0-1 transition energy of one qubit with the 1-2 transition of the other. Single qubit manipulations and state tomography cannot be done with ``traditional'' single tone microwave pulses but instead specifically shaped pulses have to be simultaneously applied on both qubits. Coherence times, coupling strength, and optimal pulses for decoupling the two qubits and perform state tomography are presented
On board processor development for NASA's spaceborne imaging radar with system-on-chip technology
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi
2004-01-01
This paper reports a preliminary study result of an on-board spaceborne SAR processor. It consists of a processing requirement analysis, functional specifications, and implementation with system-on-chip technology. Finally, a minimum version of this on-board processor designed for performance evaluation and for partial demonstration is illustrated.
Multicore: Fallout From a Computing Evolution (LBNL Summer Lecture Series)
Yelick, Kathy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)
2018-05-07
Summer Lecture Series 2008: Parallel computing used to be reserved for big science and engineering projects, but in two years that's all changed. Even laptops and hand-helds use parallel processors. Unfortunately, the software hasn't kept pace. Kathy Yelick, Director of the National Energy Research Scientific Computing Center at Berkeley Lab, describes the resulting chaos and the computing community's efforts to develop exciting applications that take advantage of tens or hundreds of processors on a single chip.
Thermal Hotspots in CPU Die and It's Future Architecture
NASA Astrophysics Data System (ADS)
Wang, Jian; Hu, Fu-Yuan
Owing to the increasing core frequency and chip integration and the limited die dimension, the power densities in CPU chip have been increasing fastly. The high temperature on chip resulted by power densities threats the processor's performance and chip's reliability. This paper analyzed the thermal hotspots in die and their properties. A new architecture of function units in die - - hot units distributed architecture is suggested to cope with the problems of high power densities for future processor chip.
Digital Waveguide Architectures for Virtual Musical Instruments
NASA Astrophysics Data System (ADS)
Smith, Julius O.
Digital sound synthesis has become a standard staple of modern music studios, videogames, personal computers, and hand-held devices. As processing power has increased over the years, sound synthesis implementations have evolved from dedicated chip sets, to single-chip solutions, and ultimately to software implementations within processors used primarily for other tasks (such as for graphics or general purpose computing). With the cost of implementation dropping closer and closer to zero, there is increasing room for higher quality algorithms.
Quantum interference in heterogeneous superconducting-photonic circuits on a silicon chip
Schuck, C.; Guo, X.; Fan, L.; Ma, X.; Poot, M.; Tang, H. X.
2016-01-01
Quantum information processing holds great promise for communicating and computing data efficiently. However, scaling current photonic implementation approaches to larger system size remains an outstanding challenge for realizing disruptive quantum technology. Two main ingredients of quantum information processors are quantum interference and single-photon detectors. Here we develop a hybrid superconducting-photonic circuit system to show how these elements can be combined in a scalable fashion on a silicon chip. We demonstrate the suitability of this approach for integrated quantum optics by interfering and detecting photon pairs directly on the chip with waveguide-coupled single-photon detectors. Using a directional coupler implemented with silicon nitride nanophotonic waveguides, we observe 97% interference visibility when measuring photon statistics with two monolithically integrated superconducting single-photon detectors. The photonic circuit and detector fabrication processes are compatible with standard semiconductor thin-film technology, making it possible to implement more complex and larger scale quantum photonic circuits on silicon chips. PMID:26792424
FPGA-Based, Self-Checking, Fault-Tolerant Computers
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2004-01-01
A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
NASA Technical Reports Server (NTRS)
Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas
2008-01-01
A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
Programmable architecture for pixel level processing tasks in lightweight strapdown IR seekers
NASA Astrophysics Data System (ADS)
Coates, James L.
1993-06-01
Typical processing tasks associated with missile IR seeker applications are described, and a straw man suite of algorithms is presented. A fully programmable multiprocessor architecture is realized on a multimedia video processor (MVP) developed by Texas Instruments. The MVP combines the elements of RISC, floating point, advanced DSPs, graphics processors, display and acquisition control, RAM, and external memory. Front end pixel level tasks typical of missile interceptor applications, operating on 256 x 256 sensor imagery, can be processed at frame rates exceeding 100 Hz in a single MVP chip.
Experience with custom processors in space flight applications
NASA Technical Reports Server (NTRS)
Fraeman, M. E.; Hayes, J. R.; Lohr, D. A.; Ballard, B. W.; Williams, R. L.; Henshaw, R. M.
1991-01-01
The Applied Physics Laboratory (APL) has developed a magnetometer instrument for a swedish satellite named Freja with launch scheduled for August 1992 on a Chinese Long March rocket. The magnetometer controller utilized a custom microprocessor designed at APL with the Genesil silicon compiler. The processor evolved from our experience with an older bit-slice design and two prior single chip efforts. The architecture of our microprocessor greatly lowered software development costs because it was optimized to provide an interactive and extensible programming environment hosted by the target hardware. Radiation tolerance of the microprocessor was also tested and was adequate for Freja's mission -- 20 kRad(Si) total dose and very infrequent latch-up and single event upset events.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications
NASA Astrophysics Data System (ADS)
Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei
2007-04-01
In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
Optical Interconnections for VLSI Computational Systems Using Computer-Generated Holography.
NASA Astrophysics Data System (ADS)
Feldman, Michael Robert
Optical interconnects for VLSI computational systems using computer generated holograms are evaluated in theory and experiment. It is shown that by replacing particular electronic connections with free-space optical communication paths, connection of devices on a single chip or wafer and between chips or modules can be improved. Optical and electrical interconnects are compared in terms of power dissipation, communication bandwidth, and connection density. Conditions are determined for which optical interconnects are advantageous. Based on this analysis, it is shown that by applying computer generated holographic optical interconnects to wafer scale fine grain parallel processing systems, dramatic increases in system performance can be expected. Some new interconnection networks, designed to take full advantage of optical interconnect technology, have been developed. Experimental Computer Generated Holograms (CGH's) have been designed, fabricated and subsequently tested in prototype optical interconnected computational systems. Several new CGH encoding methods have been developed to provide efficient high performance CGH's. One CGH was used to decrease the access time of a 1 kilobit CMOS RAM chip. Another was produced to implement the inter-processor communication paths in a shared memory SIMD parallel processor array.
A miniature on-chip multi-functional ECG signal processor with 30 µW ultra-low power consumption.
Liu, Xin; Zheng, Yuan Jin; Phyu, Myint Wai; Zhao, Bin; Je, Minkyu; Yuan, Xiao Jun
2010-01-01
In this paper, a miniature low-power Electrocardiogram (ECG) signal processing application specific integrated circuit (ASIC) chip is proposed. This chip provides multiple critical functions for ECG analysis using a systematic wavelet transform algorithm and a novel SRAM-based ASIC architecture, while achieves low cost and high performance. Using 0.18 µm CMOS technology and 1 V power supply, this ASIC chip consumes only 29 µW and occupies an area of 3 mm(2). This on-chip ECG processor is highly suitable for reliable real-time cardiac status monitoring applications.
VASP-4096: a very high performance programmable device for digital media processing applications
NASA Astrophysics Data System (ADS)
Krikelis, Argy
2001-03-01
Over the past few years, technology drivers for microprocessors have changed significantly. Media data delivery and processing--such as telecommunications, networking, video processing, speech recognition and 3D graphics--is increasing in importance and will soon dominate the processing cycles consumed in computer-based systems. This paper presents the architecture of the VASP-4096 processor. VASP-4096 provides high media performance with low energy consumption by integrating associative SIMD parallel processing with embedded microprocessor technology. The major innovations in the VASP-4096 is the integration of thousands of processing units in a single chip that are capable of support software programmable high-performance mathematical functions as well as abstract data processing. In addition to 4096 processing units, VASP-4096 integrates on a single chip a RISC controller that is an implementation of the SPARC architecture, 128 Kbytes of Data Memory, and I/O interfaces. The SIMD processing in VASP-4096 implements the ASProCore architecture, which is a proprietary implementation of SIMD processing, operates at 266 MHz with program instructions issued by the RISC controller. The device also integrates a 64-bit synchronous main memory interface operating at 133 MHz (double-data rate), and a 64- bit 66 MHz PCI interface. VASP-4096, compared with other processors architectures that support media processing, offers true performance scalability, support for deterministic and non-deterministic data processing on a single device, and software programmability that can be re- used in future chip generations.
Towards the formal verification of the requirements and design of a processor interface unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
The formal verification of the design and partial requirements for a Processor Interface Unit (PIU) using the Higher Order Logic (HOL) theorem-proving system is described. The processor interface unit is a single-chip subsystem within a fault-tolerant embedded system under development within the Boeing Defense and Space Group. It provides the opportunity to investigate the specification and verification of a real-world subsystem within a commercially-developed fault-tolerant computer. An overview of the PIU verification effort is given. The actual HOL listing from the verification effort are documented in a companion NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit - HOL Listings' including the general-purpose HOL theories and definitions that support the PIU verification as well as tactics used in the proofs.
Transportable GPU (General Processor Units) chip set technology for standard computer architectures
NASA Astrophysics Data System (ADS)
Fosdick, R. E.; Denison, H. C.
1982-11-01
The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.
Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design
NASA Astrophysics Data System (ADS)
Debes, Eric; Kaine, Greg
2002-11-01
In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.
An evaluation of MPI message rate on hybrid-core processors
Barrett, Brian W.; Brightwell, Ron; Grant, Ryan; ...
2014-11-01
Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insightmore » into how MPI implementations for future hybrid-core processors should be designed.« less
High coherence plane breaking packaging for superconducting qubits.
Bronn, Nicholas T; Adiga, Vivekananda P; Olivadese, Salvatore B; Wu, Xian; Chow, Jerry M; Pappas, David P
2018-04-01
We demonstrate a pogo pin package for a superconducting quantum processor specifically designed with a nontrivial layout topology (e.g., a center qubit that cannot be accessed from the sides of the chip). Two experiments on two nominally identical superconducting quantum processors in pogo packages, which use commercially available parts and require modest machining tolerances, are performed at low temperature (10 mK) in a dilution refrigerator and both found to behave comparably to processors in standard planar packages with wirebonds where control and readout signals come in from the edges. Single- and two-qubit gate errors are also characterized via randomized benchmarking, exhibiting similar error rates as in standard packages, opening the possibility of integrating pogo pin packaging with extensible qubit architectures.
High coherence plane breaking packaging for superconducting qubits
NASA Astrophysics Data System (ADS)
Bronn, Nicholas T.; Adiga, Vivekananda P.; Olivadese, Salvatore B.; Wu, Xian; Chow, Jerry M.; Pappas, David P.
2018-04-01
We demonstrate a pogo pin package for a superconducting quantum processor specifically designed with a nontrivial layout topology (e.g., a center qubit that cannot be accessed from the sides of the chip). Two experiments on two nominally identical superconducting quantum processors in pogo packages, which use commercially available parts and require modest machining tolerances, are performed at low temperature (10 mK) in a dilution refrigerator and both found to behave comparably to processors in standard planar packages with wirebonds where control and readout signals come in from the edges. Single- and two-qubit gate errors are also characterized via randomized benchmarking, exhibiting similar error rates as in standard packages, opening the possibility of integrating pogo pin packaging with extensible qubit architectures.
NASA Astrophysics Data System (ADS)
Xie, Yiwei; Geng, Zihan; Zhuang, Leimeng; Burla, Maurizio; Taddei, Caterina; Hoekman, Marcel; Leinse, Arne; Roeloffzen, Chris G. H.; Boller, Klaus-J.; Lowery, Arthur J.
2017-12-01
Integrated optical signal processors have been identified as a powerful engine for optical processing of microwave signals. They enable wideband and stable signal processing operations on miniaturized chips with ultimate control precision. As a promising application, such processors enables photonic implementations of reconfigurable radio frequency (RF) filters with wide design flexibility, large bandwidth, and high-frequency selectivity. This is a key technology for photonic-assisted RF front ends that opens a path to overcoming the bandwidth limitation of current digital electronics. Here, the recent progress of integrated optical signal processors for implementing such RF filters is reviewed. We highlight the use of a low-loss, high-index-contrast stoichiometric silicon nitride waveguide which promises to serve as a practical material platform for realizing high-performance optical signal processors and points toward photonic RF filters with digital signal processing (DSP)-level flexibility, hundreds-GHz bandwidth, MHz-band frequency selectivity, and full system integration on a chip scale.
Reconfigurable tree architectures using subtree oriented fault tolerance
NASA Technical Reports Server (NTRS)
Lowrie, Matthew B.
1987-01-01
An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
Fault-Tolerant, Real-Time, Multi-Core Computer System
NASA Technical Reports Server (NTRS)
Gostelow, Kim P.
2012-01-01
A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.
Sequence information signal processor for local and global string comparisons
Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.
1997-01-01
A sequence information signal processing integrated circuit chip designed to perform high speed calculation of a dynamic programming algorithm based upon the algorithm defined by Waterman and Smith. The signal processing chip of the present invention is designed to be a building block of a linear systolic array, the performance of which can be increased by connecting additional sequence information signal processing chips to the array. The chip provides a high speed, low cost linear array processor that can locate highly similar global sequences or segments thereof such as contiguous subsequences from two different DNA or protein sequences. The chip is implemented in a preferred embodiment using CMOS VLSI technology to provide the equivalent of about 400,000 transistors or 100,000 gates. Each chip provides 16 processing elements, and is designed to provide 16 bit, two's compliment operation for maximum score precision of between -32,768 and +32,767. It is designed to provide a comparison between sequences as long as 4,194,304 elements without external software and between sequences of unlimited numbers of elements with the aid of external software. Each sequence can be assigned different deletion and insertion weight functions. Each processor is provided with a similarity measure device which is independently variable. Thus, each processor can contribute to maximum value score calculation using a different similarity measure.
Single-chip microcomputer for image processing in the photonic measuring system
NASA Astrophysics Data System (ADS)
Smoleva, Olga S.; Ljul, Natalia Y.
2002-04-01
The non-contact measuring system has been designed for rail- track parameters control on the Moscow Metro. It detects some significant parameters: rail-track width, rail-track height, gage, rail-slums, crosslevel, pickets, and car speed. The system consists of three subsystems: non-contact system of rail-track width, height, and gage inspection, non-contact system of rail-slums inspection and subsystem for crosslevel, speed, and pickets detection. Data from subsystems is transferred to pre-processing unit. In order to process data received from subsystems, the single-chip signal processor ADSP-2185 must be used due to providing required processing speed. After data will be processed, it is send to PC, which processes it and outputs it in the readable form.
VLSI processors for signal detection in SETI
NASA Technical Reports Server (NTRS)
Duluk, J. F.; Linscott, I. R.; Peterson, A. M.; Burr, J.; Ekroot, B.; Twicken, J.
1989-01-01
The objective of the Search for Extraterrestrial Intelligence (SETI) is to locate an artificially created signal coming from a distant star. This is done in two steps: (1) spectral analysis of an incoming radio frequency band, and (2) pattern detection for narrow-band signals. Both steps are computationally expensive and require the development of specially designed computer architectures. To reduce the size and cost of the SETI signal detection machine, two custom VLSI chips are under development. The first chip, the SETI DSP Engine, is used in the spectrum analyzer and is specially designed to compute Discrete Fourier Transforms (DFTs). It is a high-speed arithmetic processor that has two adders, one multiplier-accumulator, and three four-port memories. The second chip is a new type of Content-Addressable Memory. It is the heart of an associative processor that is used for pattern detection. Both chips incorporate many innovative circuits and architectural features.
VLSI processors for signal detection in SETI.
Duluk, J F; Linscott, I R; Peterson, A M; Burr, J; Ekroot, B; Twicken, J
1989-01-01
The objective of the Search for Extraterrestrial Intelligence (SETI) is to locate an artificially created signal coming from a distant star. This is done in two steps: (1) spectral analysis of an incoming radio frequency band, and (2) pattern detection for narrow-band signals. Both steps are computationally expensive and require the development of specially designed computer architectures. To reduce the size and cost of the SETI signal detection machine, two custom VLSI chips are under development. The first chip, the SETI DSP Engine, is used in the spectrum analyzer and is specially designed to compute Discrete Fourier Transforms (DFTs). It is a high-speed arithmetic processor that has two adders, one multiplier-accumulator, and three four-port memories. The second chip is a new type of Content-Addressable Memory. It is the heart of an associative processor that is used for pattern detection. Both chips incorporate many innovative circuits and architectural features.
Novel memory architecture for video signal processor
NASA Astrophysics Data System (ADS)
Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei
1993-11-01
An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.
ERIC Educational Resources Information Center
Marcovitz, Alan B., Ed.
This paper describes an introductory course in microprocessors and microcomputers implemented at Grossmont College. The current state-of-the-art in the microprocessor field is discussed, with special emphasis on the 8-bit MOS single-chip processors which are the most commonly used devices. Objectives and guidelines for the course are presented,…
2008-07-31
Unlike the Lyrtech, each DSP on a Bittware board offers 3 MB of on-chip memory and 3 GFLOPs of 32-bit peak processing power. Based on the performance...Each NVIDIA 8800 Ultra features 576 GFLOPS on 128 612-MHz single-precision floating-point SIMD processors, arranged in 16 clusters of eight. Each
MAP3D: a media processor approach for high-end 3D graphics
NASA Astrophysics Data System (ADS)
Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris
1999-12-01
Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
System on chip module configured for event-driven architecture
Robbins, Kevin; Brady, Charles E.; Ashlock, Tad A.
2017-10-17
A system on chip (SoC) module is described herein, wherein the SoC modules comprise a processor subsystem and a hardware logic subsystem. The processor subsystem and hardware logic subsystem are in communication with one another, and transmit event messages between one another. The processor subsystem executes software actors, while the hardware logic subsystem includes hardware actors, the software actors and hardware actors conform to an event-driven architecture, such that the software actors receive and generate event messages and the hardware actors receive and generate event messages.
NASA Astrophysics Data System (ADS)
Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.
2014-03-01
The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
Towards an Analogue Neuromorphic VLSI Instrument for the Sensing of Complex Odours
NASA Astrophysics Data System (ADS)
Ab Aziz, Muhammad Fazli; Harun, Fauzan Khairi Che; Covington, James A.; Gardner, Julian W.
2011-09-01
Almost all electronic nose instruments reported today employ pattern recognition algorithms written in software and run on digital processors, e.g. micro-processors, microcontrollers or FPGAs. Conversely, in this paper we describe the analogue VLSI implementation of an electronic nose through the design of a neuromorphic olfactory chip. The modelling, design and fabrication of the chip have already been reported. Here a smart interface has been designed and characterised for thisneuromorphic chip. Thus we can demonstrate the functionality of the a VLSI neuromorphic chip, producing differing principal neuron firing patterns to real sensor response data. Further work is directed towards integrating 9 separate neuromorphic chips to create a large neuronal network to solve more complex olfactory problems.
Unraveling the CHIP:Hsp70 complex as an information processor for protein quality control.
VanPelt, Jamie; Page, Richard C
2017-02-01
The CHIP:Hsp70 complex stands at the crossroads of the cellular protein quality control system. Hsp70 facilitates active refolding of misfolded client proteins, while CHIP directs ubiquitination of misfolded client proteins bound to Hsp70. The direct competition between CHIP and Hsp70 for the fate of misfolded proteins leads to the question: how does the CHIP:Hsp70 complex execute triage decisions that direct misfolded proteins for either refolding or degradation? The current body of literature points toward action of the CHIP:Hsp70 complex as an information processor that takes inputs in the form of client folding state, dynamics, and posttranslational modifications, then outputs either refolded or ubiquitinated client proteins. Herein we examine the CHIP:Hsp70 complex beginning with the structure and function of CHIP and Hsp70, followed by an examination of recent studies of the interactions and dynamics of the CHIP:Hsp70 complex. Copyright © 2016 Elsevier B.V. All rights reserved.
Companion Chip: Building a Segregated Hardware Architecture
NASA Astrophysics Data System (ADS)
Pareaud, Thomas; Houelle, Alain; Vaucher, Niolas; Albinet, Mathieu; Honvault, Christophe
2011-08-01
Partitioning is a more and more mature concept in Space industry. It aims at assuring that some error propagation modes are not possible. This paper gives an overview of an analysis conducted in the frame of a research and technology study performed in 2010/2011. The "Java Companion Chip" study addresses an interesting approach to partitioning using hardware concepts: a SoC architecture integrates a master processor, a companion chip and additional hardware functions aiming at enforcing the time and space segregation between the master processor and the slave one.This paper discusses the benefits and the main challenges of the proposed approach. In addition, it presents an application of these concepts to a case study: a Leon/Java processor architecture able to concurrently execute native and Java applications.
Linear Approximation SAR Azimuth Processing Study
NASA Technical Reports Server (NTRS)
Lindquist, R. B.; Masnaghetti, R. K.; Belland, E.; Hance, H. V.; Weis, W. G.
1979-01-01
A segmented linear approximation of the quadratic phase function that is used to focus the synthetic antenna of a SAR was studied. Ideal focusing, using a quadratic varying phase focusing function during the time radar target histories are gathered, requires a large number of complex multiplications. These can be largely eliminated by using linear approximation techniques. The result is a reduced processor size and chip count relative to ideally focussed processing and a correspondingly increased feasibility for spaceworthy implementation. A preliminary design and sizing for a spaceworthy linear approximation SAR azimuth processor meeting requirements similar to those of the SEASAT-A SAR was developed. The study resulted in a design with approximately 1500 IC's, 1.2 cubic feet of volume, and 350 watts of power for a single look, 4000 range cell azimuth processor with 25 meters resolution.
Fast particles identification in programmable form at level-0 trigger by means of the 3D-Flow system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crosetto, Dario B.
1998-10-30
The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on either an FPGA or an ASIC implementation, it can address, in a fully programmable manner, applications where commercially available processors would fail because of throughput requirements. Possible applications include filtering-algorithms (pattern recognition) from the input of multiple sensors, as well as moving any input validated by these filtering-algorithms to a single output channel. Both operations can easily be implemented on a 3D-Flow system to achieve a real-time processing system with a very short lag time. This system can be built either with off-the-shelfmore » FPGAs or, for higher data rates, with CMOS chips containing 4 to 16 processors each. The basic building block of the system, a 3D-Flow processor, has been successfully designed in VHDL code written in ''Generic HDL'' (mostly made of reusable blocks that are synthesizable in different technologies, or FPGAs), to produce a netlist for a four-processor ASIC featuring 0.35 micron CBA (Ceil Base Array) technology at 3.3 Volts, 884 mW power dissipation at 60 MHz and 63.75 mm sq. die size. The same VHDL code has been targeted to three FPGA manufacturers (Altera EPF10K250A, ORCA-Lucent Technologies 0R3T165 and Xilinx XCV1000). A complete set of software tools, the 3D-Flow System Manager, equally applicable to ASIC or FPGA implementations, has been produced to provide full system simulation, application development, real-time monitoring, and run-time fault recovery. Today's technology can accommodate 16 processors per chip in a medium size die, at a cost per processor of less than $5 based on the current silicon die/size technology cost.« less
[Development of the automatic dental X-ray film processor].
Bai, J; Chen, H
1999-07-01
This paper introduces a multiple-point detecting technique of the density of dental X-ray films. With the infrared ray multiple-point detecting technique, a single-chip microcomputer control system is used to analyze the effectiveness of the film-developing in real time in order to achieve a good image. Based on the new technology, We designed the intelligent automatic dental X-ray film processing.
Adaptive Optoelectronic Eyes: Hybrid Sensor/Processor Architectures
2006-11-13
corresponding calculated data. The width of the mirror stopband is proportional to the refractive index difference between the high and low index materials ...Silicon VLSI Neuron Unit Arrays 56 Development of a Single-Sided Flip-Chip Bonding Process 65 Development of High Refractive Index Diffractive Optical ...Elements (DOEs) 68 Development of High-Performance Antireflection Coatings for High Refractive Index DOEs 69 Design and Fabrication of Low Threshold
Fault-tolerant computer study. [logic designs for building block circuits
NASA Technical Reports Server (NTRS)
Rennels, D. A.; Avizienis, A. A.; Ercegovac, M. D.
1981-01-01
A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed.
Multichannel Baseband Processor for Wideband CDMA
NASA Astrophysics Data System (ADS)
Jalloul, Louay M. A.; Lin, Jim
2005-12-01
The system architecture of the cellular base station modem engine (CBME) is described. The CBME is a single-chip multichannel transceiver capable of processing and demodulating signals from multiple users simultaneously. It is optimized to process different classes of code-division multiple-access (CDMA) signals. The paper will show that through key functional system partitioning, tightly coupled small digital signal processing cores, and time-sliced reuse architecture, CBME is able to achieve a high degree of algorithmic flexibility while maintaining efficiency. The paper will also highlight the implementation and verification aspects of the CBME chip design. In this paper, wideband CDMA is used as an example to demonstrate the architecture concept.
Technology transfer of military space microprocessor developments
NASA Astrophysics Data System (ADS)
Gorden, C.; King, D.; Byington, L.; Lanza, D.
1999-01-01
Over the past 13 years the Air Force Research Laboratory (AFRL) has led the development of microprocessors and computers for USAF space and strategic missile applications. As a result of these Air Force development programs, advanced computer technology is available for use by civil and commercial space customers as well. The Generic VHSIC Spaceborne Computer (GVSC) program began in 1985 at AFRL to fulfill a deficiency in the availability of space-qualified data and control processors. GVSC developed a radiation hardened multi-chip version of the 16-bit, Mil-Std 1750A microprocessor. The follow-on to GVSC, the Advanced Spaceborne Computer Module (ASCM) program, was initiated by AFRL to establish two industrial sources for complete, radiation-hardened 16-bit and 32-bit computers and microelectronic components. Development of the Control Processor Module (CPM), the first of two ASCM contract phases, concluded in 1994 with the availability of two sources for space-qualified, 16-bit Mil-Std-1750A computers, cards, multi-chip modules, and integrated circuits. The second phase of the program, the Advanced Technology Insertion Module (ATIM), was completed in December 1997. ATIM developed two single board computers based on 32-bit reduced instruction set computer (RISC) processors. GVSC, CPM, and ATIM technologies are flying or baselined into the majority of today's DoD, NASA, and commercial satellite systems.
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Towards the formal specification of the requirements and design of a processor interface unit
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
Work to formally specify the requirements and design of a Processor Interface Unit (PIU), a single-chip subsystem providing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system, is described. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance free operation, or both. The approaches that were developed for modeling the PIU requirements and for composition of the PIU subcomponents at high levels of abstraction are described. These approaches were used to specify and verify a nontrivial subset of the PIU behavior. The PIU specification in Higher Order Logic (HOL) is documented in a companion NASA contractor report entitled 'Towards the Formal Specification of the Requirements and Design of a Processor Interfacs Unit - HOL Listings.' The subsequent verification approach and HOL listings are documented in NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit' and NASA contractor report entitled 'Towards the Formal Verification of the Requirements and Design of a Processor Interface Unit - HOL Listings.'
QCDOC: A 10-teraflops scale computer for lattice QCD
NASA Astrophysics Data System (ADS)
Chen, D.; Christ, N. H.; Cristian, C.; Dong, Z.; Gara, A.; Garg, K.; Joo, B.; Kim, C.; Levkova, L.; Liao, X.; Mawhinney, R. D.; Ohta, S.; Wettig, T.
2001-03-01
The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from "QCD On a Chip".
SAMPA Chip: the New 32 Channels ASIC for the ALICE TPC and MCH Upgrades
NASA Astrophysics Data System (ADS)
Adolfsson, J.; Ayala Pabon, A.; Bregant, M.; Britton, C.; Brulin, G.; Carvalho, D.; Chambert, V.; Chinellato, D.; Espagnon, B.; Hernandez Herrera, H. D.; Ljubicic, T.; Mahmood, S. M.; Mjörnmark, U.; Moraes, D.; Munhoz, M. G.; Noël, G.; Oskarsson, A.; Osterman, L.; Pilyar, A.; Read, K.; Ruette, A.; Russo, P.; Sanches, B. C. S.; Severo, L.; Silvermyr, D.; Suire, C.; Tambave, G. J.; Tun-Lanoë, K. M. M.; van Noije, W.; Velure, A.; Vereschagin, S.; Wanlin, E.; Weber, T. O.; Zaporozhets, S.
2017-04-01
This paper presents the test results of the second prototype of SAMPA, the ASIC designed for the upgrade of read-out front end electronics of the ALICE Time Projection Chamber (TPC) and Muon Chamber (MCH). SAMPA is made in a 130 nm CMOS technology with 1.25 V nominal voltage supply and provides 32 channels, with selectable input polarity, and three possible combinations of shaping time and sensitivity. Each channel consists of a Charge Sensitive Amplifier, a semi-Gaussian shaper and a 10-bit ADC; a Digital Signal Processor provides digital filtering and compression capability. In the second prototype run both full chip and single test blocks were fabricated, allowing block characterization and full system behaviour studies. Experimental results are here presented showing agreement with requirements for both the blocks and the full chip.
NASA Technical Reports Server (NTRS)
Szabo, Carl M., Jr.; Duncan, Adam; LaBel, Kenneth A.; Kay, Matt; Bruner, Pat; Krzesniak, Mike; Dong, Lei
2015-01-01
Hardness assurance test results of Intel state-of-the-art 14nm Broadwell U-series processor System-on-a-Chip (SoC) for total dose are presented, along with first-look exploratory results from trials at a medical proton facility. Test method builds upon previous efforts by utilizing commercial laptop motherboards and software stress applications as opposed to more traditional automated test equipment (ATE).
USDA-ARS?s Scientific Manuscript database
Environmental stresses that increase tuber contents of the reducing sugars glucose and fructose decrease the value of chipping potatoes because such tubers produce dark-colored chips that are unacceptable to processors and consumers. Stem-end chip defect (SECD), which causes regions of dark color al...
Architectural Techniques For Managing Non-volatile Caches
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
As chip power dissipation becomes a critical challenge in scaling processor performance, computer architects are forced to fundamentally rethink the design of modern processors and hence, the chip-design industry is now at a major inflection point in its hardware roadmap. The high leakage power and low density of SRAM poses serious obstacles in its use for designing large on-chip caches and for this reason, researchers are exploring non-volatile memory (NVM) devices, such as spin torque transfer RAM, phase change RAM and resistive RAM. However, since NVMs are not strictly superior to SRAM, effective architectural techniques are required for making themmore » a universal memory solution. This book discusses techniques for designing processor caches using NVM devices. It presents algorithms and architectures for improving their energy efficiency, performance and lifetime. It also provides both qualitative and quantitative evaluation to help the reader gain insights and motivate them to explore further. This book will be highly useful for beginners as well as veterans in computer architecture, chip designers, product managers and technical marketing professionals.« less
Hot Chips and Hot Interconnects for High End Computing Systems
NASA Technical Reports Server (NTRS)
Saini, Subhash
2005-01-01
I will discuss several processors: 1. The Cray proprietary processor used in the Cray X1; 2. The IBM Power 3 and Power 4 used in an IBM SP 3 and IBM SP 4 systems; 3. The Intel Itanium and Xeon, used in the SGI Altix systems and clusters respectively; 4. IBM System-on-a-Chip used in IBM BlueGene/L; 5. HP Alpha EV68 processor used in DOE ASCI Q cluster; 6. SPARC64 V processor, which is used in the Fujitsu PRIMEPOWER HPC2500; 7. An NEC proprietary processor, which is used in NEC SX-6/7; 8. Power 4+ processor, which is used in Hitachi SR11000; 9. NEC proprietary processor, which is used in Earth Simulator. The IBM POWER5 and Red Storm Computing Systems will also be discussed. The architectures of these processors will first be presented, followed by interconnection networks and a description of high-end computer systems based on these processors and networks. The performance of various hardware/programming model combinations will then be compared, based on latest NAS Parallel Benchmark results (MPI, OpenMP/HPF and hybrid (MPI + OpenMP). The tutorial will conclude with a discussion of general trends in the field of high performance computing, (quantum computing, DNA computing, cellular engineering, and neural networks).
Research on NC motion controller based on SOPC technology
NASA Astrophysics Data System (ADS)
Jiang, Tingbiao; Meng, Biao
2006-11-01
With the rapid development of the digitization and informationization, the application of numerical control technology in the manufacturing industry becomes more and more important. However, the conventional numerical control system usually has some shortcomings such as the poor in system openness, character of real-time, cutability and reconfiguration. In order to solve these problems, this paper investigates the development prospect and advantage of the application in numerical control area with system-on-a-Programmable-Chip (SOPC) technology, and puts forward to a research program approach to the NC controller based on SOPC technology. Utilizing the characteristic of SOPC technology, we integrate high density logic device FPGA, memory SRAM, and embedded processor ARM into a single programmable logic device. We also combine the 32-bit RISC processor with high computing capability of the complicated algorithm with the FPGA device with strong motivable reconfiguration logic control ability. With these steps, we can greatly resolve the defect described in above existing numerical control systems. For the concrete implementation method, we use FPGA chip embedded with ARM hard nuclear processor to construct the control core of the motion controller. We also design the peripheral circuit of the controller according to the requirements of actual control functions, transplant real-time operating system into ARM, design the driver of the peripheral assisted chip, develop the application program to control and configuration of FPGA, design IP core of logic algorithm for various NC motion control to configured it into FPGA. The whole control system uses the concept of modular and structured design to develop hardware and software system. Thus the NC motion controller with the advantage of easily tailoring, highly opening, reconfigurable, and expandable can be implemented.
MIL-STD-1553B Marconi LSI chip set in a remote terminal application
NASA Astrophysics Data System (ADS)
Dimarino, A.
1982-11-01
Marconi Avionics is utilizing the MIL-STD-1553B LSI Chip Set in the SCADC Air Data Computer application to perform all of the required remote terminal MIL-STD-1553B protocol functions. Basic components of the RTU are the dual redundant chip set, CT3231 Transceivers, 256 x 16 RAM and a Z8002 microprocessor. Basic transfers are to/from the RAM command of the bus controller or Z8002 processor. During transfers from the processor to the RAM, the chip set busy bit is set for a period not exceeding 250 microseconds. When the transfer is complete, the busy bit is released and transfers to the data bus occur on command. The LSI Chip Set word count lines are used to locate each data word in the local memory and 4 mode codes are used in the application: reset remote terminal, transmit status word, transmitter shut-down, and override transmitter shutdown.
Preliminary Radiation Testing of a State-of-the-Art Commercial 14nm CMOS Processor/System-on-a-Chip
NASA Technical Reports Server (NTRS)
Szabo, Carl M., Jr.; Duncan, Adam; LaBel, Kenneth A.; Kay, Matt; Bruner, Pat; Krzesniak, Mike; Dong, Lei
2015-01-01
Hardness assurance test results of Intel state-of-the-art 14nm “Broadwell” U-series processor / System-on-a-Chip (SoC) for total ionizing dose (TID) are presented, along with exploratory results from trials at a medical proton facility. Test method builds upon previous efforts [1] by utilizing commercial laptop motherboards and software stress applications as opposed to more traditional automated test equipment (ATE).
Park, Daejin; Cho, Jeonghun
2014-01-01
A specially designed sensor processor used as a main processor in IoT (internet-of-thing) device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG). Using an event signal processing unit (EPU) as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio-) based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error.
Suppression of the vacuolar invertase gene delays senescent sweetening in chipping potatoes
USDA-ARS?s Scientific Manuscript database
Background: Potato chip processors require potato tubers that meet quality specifications for fried chip color, and color depends largely upon tuber sugar contents. At later times in storage, potatoes accumulate sucrose, glucose and fructose. This developmental process, senescent sweetening, manifes...
Many-core computing for space-based stereoscopic imaging
NASA Astrophysics Data System (ADS)
McCall, Paul; Torres, Gildo; LeGrand, Keith; Adjouadi, Malek; Liu, Chen; Darling, Jacob; Pernicka, Henry
The potential benefits of using parallel computing in real-time visual-based satellite proximity operations missions are investigated. Improvements in performance and relative navigation solutions over single thread systems can be achieved through multi- and many-core computing. Stochastic relative orbit determination methods benefit from the higher measurement frequencies, allowing them to more accurately determine the associated statistical properties of the relative orbital elements. More accurate orbit determination can lead to reduced fuel consumption and extended mission capabilities and duration. Inherent to the process of stereoscopic image processing is the difficulty of loading, managing, parsing, and evaluating large amounts of data efficiently, which may result in delays or highly time consuming processes for single (or few) processor systems or platforms. In this research we utilize the Single-Chip Cloud Computer (SCC), a fully programmable 48-core experimental processor, created by Intel Labs as a platform for many-core software research, provided with a high-speed on-chip network for sharing information along with advanced power management technologies and support for message-passing. The results from utilizing the SCC platform for the stereoscopic image processing application are presented in the form of Performance, Power, Energy, and Energy-Delay-Product (EDP) metrics. Also, a comparison between the SCC results and those obtained from executing the same application on a commercial PC are presented, showing the potential benefits of utilizing the SCC in particular, and any many-core platforms in general for real-time processing of visual-based satellite proximity operations missions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nash, T.; Atac, R.; Cook, A.
1989-03-06
The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
NASA Astrophysics Data System (ADS)
Erez, Mattan; Dally, William J.
Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
System on a chip with MPEG-4 capability
NASA Astrophysics Data System (ADS)
Yassa, Fathy; Schonfeld, Dan
2002-12-01
Current products supporting video communication applications rely on existing computer architectures. RISC processors have been used successfully in numerous applications over several decades. DSP processors have become ubiquitous in signal processing and communication applications. Real-time applications such as speech processing in cellular telephony rely extensively on the computational power of these processors. Video processors designed to implement the computationally intensive codec operations have also been used to address the high demands of video communication applications (e.g., cable set-top boxes and DVDs). This paper presents an overview of a system-on-chip (SOC) architecture used for real-time video in wireless communication applications. The SOC specifications answer to the system requirements imposed by the application environment. A CAM-based video processor is used to accelerate data intensive video compression tasks such as motion estimations and filtering. Other components are dedicated to system level data processing and audio processing. A rich set of I/Os allows the SOC to communicate with other system components such as baseband and memory subsystems.
NASA Astrophysics Data System (ADS)
Pruhs, Kirk
A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
A Simple and Affordable TTL Processor for the Classroom
ERIC Educational Resources Information Center
Feinberg, Dave
2007-01-01
This paper presents a simple 4 bit computer processor design that may be built using TTL chips for less than $65. In addition to describing the processor itself in detail, we discuss our experience using the laboratory kit and its associated machine instruction set to teach computer architecture to high school students. (Contains 3 figures and 5…
Therapeutic hypertension system based on a microbreathing pressure sensor system.
Diao, Ziji; Liu, Hongying; Zhu, Lan; Gao, Xiaoqiang; Zhao, Suwen; Pi, Xitian; Zheng, Xiaolin
2011-01-01
A novel therapeutic system for the treatment of hypertension was developed on the basis of a slow-breath training mechanism, using a microbreathing pressure sensor device for the detection of human respiratory signals attached to the abdomen. The system utilizes a single-chip AT89C51 microcomputer as a core processor, programmed by Microsoft Visual C++6.0 to communicate with a PC via a full-speed PDIUSBD12 interface chip. The programming is based on a slow-breath guided algorithm in which the respiratory signal serves as a physiological feedback parameter. Inhalation and exhalation by the subject is guided by music signals. Our study indicates that this microbreathing sensor system may assist in slow-breath training and may help to decrease blood pressure.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor
Tayara, Hilal; Ham, Woonchul; Chong, Kil To
2016-01-01
This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation. PMID:27983714
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.
Tayara, Hilal; Ham, Woonchul; Chong, Kil To
2016-12-15
This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
2014-01-01
A specially designed sensor processor used as a main processor in IoT (internet-of-thing) device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG). Using an event signal processing unit (EPU) as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio-) based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error. PMID:25580458
FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.
Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora
2013-09-01
In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.
A configurable and low-power mixed signal SoC for portable ECG monitoring applications.
Kim, Hyejung; Kim, Sunyoung; Van Helleputte, Nick; Artes, Antonio; Konijnenburg, Mario; Huisken, Jos; Van Hoof, Chris; Yazicioglu, Refet Firat
2014-04-01
This paper describes a mixed-signal ECG System-on-Chip (SoC) that is capable of implementing configurable functionality with low-power consumption for portable ECG monitoring applications. A low-voltage and high performance analog front-end extracts 3-channel ECG signals and single channel electrode-tissue-impedance (ETI) measurement with high signal quality. This can be used to evaluate the quality of the ECG measurement and to filter motion artifacts. A custom digital signal processor consisting of 4-way SIMD processor provides the configurability and advanced functionality like motion artifact removal and R peak detection. A built-in 12-bit analog-to-digital converter (ADC) is capable of adaptive sampling achieving a compression ratio of up to 7, and loop buffer integration reduces the power consumption for on-chip memory access. The SoC is implemented in 0.18 μm CMOS process and consumes 32 μ W from a 1.2 V while heart beat detection application is running, and integrated in a wireless ECG monitoring system with Bluetooth protocol. Thanks to the ECG SoC, the overall system power consumption can be reduced significantly.
Research in the design of high-performance reconfigurable systems
NASA Technical Reports Server (NTRS)
Slotnick, D. L.; Mcewan, S. D.; Spry, A. J.
1984-01-01
An initial design for the Bit Processor (BP) referred to in prior reports as the Processing Element or PE has been completed. Eight BP's, together with their supporting random-access memory, a 64 k x 9 ROM to perform addition, routing logic, and some additional logic, constitute the components of a single stage. An initial stage design is given. Stages may be combined to perform high-speed fixed or floating point arithmetic. Stages can be configured into a range of arithmetic modules that includes bit-serial one or two-dimensional arrays; one or two dimensional arrays fixed or floating point processors; and specialized uniprocessors, such as long-word arithmetic units. One to eight BP's represent a likely initial chip level. The Stage would then correspond to a first-level pluggable module. As both this project and VLSI CAD/CAM progress, however, it is expected that the chip level would migrate upward to the stage and, perhaps, ultimately the box level. The BP RAM, consisting of two banks, holds only operands and indices. Programs are at the box (high-level function) and system level. At the system level initial effort has been concentrated on specifying the tools needed to evaluate design alternatives.
Application of total distributed control system in car-body inspection
NASA Astrophysics Data System (ADS)
Yang, Xueyou; Ren, Dahai; Wang, Zhong; Ye, Shenghua; Lu, Hongbo; Duan, Jilin
1996-08-01
An application of distributed control system in Autocar-body Visual Inspection Station is presented in the paper, a distributed control system using PC as the host processor and single-chip microcomputer as the slave controller is proposed. In this paper, the physical interface of the control network and the relevant hardware are introduced. Meanwhile, a minute research on data communication is performed, relevant protocols on data framing, instruction codes and channel access methods have been laid down and part of related software is presented.
Distributed control system in a car-body inspection station
NASA Astrophysics Data System (ADS)
Yang, Xueyou; Ren, Dahai; Ye, Shenghua; Lu, Hongbo; Duan, Jilin
1997-06-01
In this paper, a distributed control network in autocar-body visual inspection station is presented in which PC is used as the host processor and single-chip microcomputers are employed as slave controllers. The physical interface of the control network and the relevant hardware are introduced in this paper. Meanwhile, a minute research on data communication is performed, relevant protocols on data framing, instruction codes and channel access methods have been laid down and part of related software is presented.
A 32-bit NMOS microprocessor with a large register file
NASA Astrophysics Data System (ADS)
Sherburne, R. W., Jr.; Katevenis, M. G. H.; Patterson, D. A.; Sequin, C. H.
1984-10-01
Two scaled versions of a 32-bit NMOS reduced instruction set computer CPU, called RISC II, have been implemented on two different processing lines using the simple Mead and Conway layout rules with lambda values of 2 and 1.5 microns (corresponding to drawn gate lengths of 4 and 3 microns), respectively. The design utilizes a small set of simple instructions in conjunction with a large register file in order to provide high performance. This approach has resulted in two surprisingly powerful single-chip processors.
The Fermilab lattice supercomputer project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischler, M.; Atac, R.; Cook, A.
1989-02-01
The ACPMAPS system is a highly cost effective, local memory MIMD computer targeted at algorithm development and production running for gauge theory on the lattice. The machine consists of a compound hypercube of crates, each of which is a full crossbar switch containing several processors. The processing nodes are single board array processors based on the Weitek XL chip set, each with a peak power of 20 MFLOPS and supported by 8 MBytes of data memory. The system currently being assembled has a peak power of 5 GFLOPS, delivering performance at approximately $250/MFLOP. The system is programmable in C andmore » Fortran. An underpinning of software routines (CANOPY) provides an easy and natural way of coding lattice problems, such that the details of parallelism, and communication and system architecture are transparent to the user. CANOPY can easily be ported to any single CPU or MIMD system which supports C, and allows the coding of typical applications with very little effort. 3 refs., 1 fig.« less
NASA Astrophysics Data System (ADS)
Nitadori, Keigo; Makino, Junichiro; Hut, Piet
2006-12-01
The main performance bottleneck of gravitational N-body codes is the force calculation between two particles. We have succeeded in speeding up this pair-wise force calculation by factors between 2 and 10, depending on the code and the processor on which the code is run. These speed-ups were obtained by writing highly fine-tuned code for x86_64 microprocessors. Any existing N-body code, running on these chips, can easily incorporate our assembly code programs. In the current paper, we present an outline of our overall approach, which we illustrate with one specific example: the use of a Hermite scheme for a direct N2 type integration on a single 2.0 GHz Athlon 64 processor, for which we obtain an effective performance of 4.05 Gflops, for double-precision accuracy. In subsequent papers, we will discuss other variations, including the combinations of N log N codes, single-precision implementations, and performance on other microprocessors.
The role of EEPROM devices in upcoming ISDN applications
NASA Astrophysics Data System (ADS)
Nette, Herbert L.
1991-02-01
Integrated Services Digital Network (ISDN) equipments are rapidly becoming a major market for semiconductor chips. Although at first glance this growing market appears to be geared at logic chips, nonvolatile memories represent important support chips and will become a significant segment of this market. Challenges in these applications consist in operating EEPROMs at lower voltages and lower power and embedding them on ever more complex communications processor chips.
Initial Performance Results on IBM POWER6
NASA Technical Reports Server (NTRS)
Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh
2008-01-01
The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Novel Highly Parallel and Systolic Architectures Using Quantum Dot-Based Hardware
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Benny N.; Spotnitz, Matthew
1997-01-01
VLSI technology has made possible the integration of massive number of components (processors, memory, etc.) into a single chip. In VLSI design, memory and processing power are relatively cheap and the main emphasis of the design is on reducing the overall interconnection complexity since data routing costs dominate the power, time, and area required to implement a computation. Communication is costly because wires occupy the most space on a circuit and it can also degrade clock time. In fact, much of the complexity (and hence the cost) of VLSI design results from minimization of data routing. The main difficulty in VLSI routing is due to the fact that crossing of the lines carrying data, instruction, control, etc. is not possible in a plane. Thus, in order to meet this constraint, the VLSI design aims at keeping the architecture highly regular with local and short interconnection. As a result, while the high level of integration has opened the way for massively parallel computation, practical and full exploitation of such a capability in many applications of interest has been hindered by the constraints on interconnection pattern. More precisely. the use of only localized communication significantly simplifies the design of interconnection architecture but at the expense of somewhat restricted class of applications. For example, there are currently commercially available products integrating; hundreds of simple processor elements within a single chip. However, the lack of adequate interconnection pattern among these processing elements make them inefficient for exploiting a large degree of parallelism in many applications.
Bio-Inspired Microsystem for Robust Genetic Assay Recognition
Lue, Jaw-Chyng; Fang, Wai-Chi
2008-01-01
A compact integrated system-on-chip (SoC) architecture solution for robust, real-time, and on-site genetic analysis has been proposed. This microsystem solution is noise-tolerable and suitable for analyzing the weak fluorescence patterns from a PCR prepared dual-labeled DNA microchip assay. In the architecture, a preceding VLSI differential logarithm microchip is designed for effectively computing the logarithm of the normalized input fluorescence signals. A posterior VLSI artificial neural network (ANN) processor chip is used for analyzing the processed signals from the differential logarithm stage. A single-channel logarithmic circuit was fabricated and characterized. A prototype ANN chip with unsupervised winner-take-all (WTA) function was designed, fabricated, and tested. An ANN learning algorithm using a novel sigmoid-logarithmic transfer function based on the supervised backpropagation (BP) algorithm is proposed for robustly recognizing low-intensity patterns. Our results show that the trained new ANN can recognize low-fluorescence patterns better than an ANN using the conventional sigmoid function. PMID:18566679
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-08-21
Recent advancements in technology scaling have shown a trend towards greater integration with large-scale chips containing thousands of processors connected to memories and other I/O devices using non-trivial network topologies. Software simulation proves insufficient to study the tradeoffs in such complex systems due to slow execution time, whereas hardware RTL development is too time-consuming. We present OpenSoC Fabric, an on-chip network generation infrastructure which aims to provide a parameterizable and powerful on-chip network generator for evaluating future high performance computing architectures based on SoC technology. OpenSoC Fabric leverages a new hardware DSL, Chisel, which contains powerful abstractions provided by itsmore » base language, Scala, and generates both software (C++) and hardware (Verilog) models from a single code base. The OpenSoC Fabric2 infrastructure is modeled after existing state-of-the-art simulators, offers large and powerful collections of configuration options, and follows object-oriented design and functional programming to make functionality extension as easy as possible.« less
Test aspects of the JPL Viterbi decoder
NASA Technical Reports Server (NTRS)
Breuer, M. A.
1989-01-01
The generation of test vectors and design-for-test aspects of the Jet Propulsion Laboratory (JPL) Very Large Scale Integration (VLSI) Viterbi decoder chip is discussed. Each processor integrated circuit (IC) contains over 20,000 gates. To achieve a high degree of testability, a scan architecture is employed. The logic has been partitioned so that very few test vectors are required to test the entire chip. In addition, since several blocks of logic are replicated numerous times on this chip, test vectors need only be generated for each block, rather than for the entire circuit. These unique blocks of logic have been identified and test sets generated for them. The approach employed for testing was to use pseudo-exhaustive test vectors whenever feasible. That is, each cone of logid is tested exhaustively. Using this approach, no detailed logic design or fault model is required. All faults which modify the function of a block of combinational logic are detected, such as all irredundant single and multiple stuck-at faults.
WDM mid-board optics for chip-to-chip wavelength routing interconnects in the H2020 ICT-STREAMS
NASA Astrophysics Data System (ADS)
Kanellos, G. T.; Pleros, N.
2017-02-01
Multi-socket server boards have emerged to increase the processing power density on the board level and further flatten the data center networks beyond leaf-spine architectures. Scaling however the number of processors per board puts current electronic technologies into challenge, as it requires high bandwidth interconnects and high throughput switches with increased number of ports that are currently unavailable. On-board optical interconnection has proved the potential to efficiently satisfy the bandwidth needs, but their use has been limited to parallel links without performing any smart routing functionality. With CWDM optical interconnects already a commodity, cyclical wavelength routing proposed to fit the datacom for rack-to-rack and board-to-board communication now becomes a promising on-board routing platform. ICT-STREAMS is a European research project that aims to combine WDM parallel on-board transceivers with a cyclical AWGR, in order to create a new board-level, chip-to-chip interconnection paradigm that will leverage WDM parallel transmission to a powerful wavelength routing platform capable to interconnect multiple processors with unprecedented bandwidth and throughput capacity. Direct, any-to-any, on-board interconnection of multiple processors will significantly contribute to further flatten the data centers and facilitate east-west communication. In the present communication, we present ICT-STREAMS on-board wavelength routing architecture for multiple chip-to-chip interconnections and evaluate the overall system performance in terms of throughput and latency for several schemes and traffic profiles. We also review recent advances of the ICT-STREAMS platform key-enabling technologies that span from Si in-plane lasers and polymer based electro-optical circuit boards to silicon photonics transceivers and photonic-crystal amplifiers.
Dynamically programmable cache
NASA Astrophysics Data System (ADS)
Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas
1998-10-01
Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Multiple core computer processor with globally-accessible local memories
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shalf, John; Donofrio, David; Oliker, Leonid
A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality ofmore » processor cores.« less
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
NASA Astrophysics Data System (ADS)
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures
Manolakos, Elias S.
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.
Sharma, Anuj; Manolakos, Elias S
2015-01-01
Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
NASA Astrophysics Data System (ADS)
Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos
1990-03-01
The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
Large-Constraint-Length, Fast Viterbi Decoder
NASA Technical Reports Server (NTRS)
Collins, O.; Dolinar, S.; Hsu, In-Shek; Pollara, F.; Olson, E.; Statman, J.; Zimmerman, G.
1990-01-01
Scheme for efficient interconnection makes VLSI design feasible. Concept for fast Viterbi decoder provides for processing of convolutional codes of constraint length K up to 15 and rates of 1/2 to 1/6. Fully parallel (but bit-serial) architecture developed for decoder of K = 7 implemented in single dedicated VLSI circuit chip. Contains six major functional blocks. VLSI circuits perform branch metric computations, add-compare-select operations, and then store decisions in traceback memory. Traceback processor reads appropriate memory locations and puts out decoded bits. Used as building block for decoders of larger K.
1994-08-26
an Itegrated Circuit Global Router. In Proc. of PPEARS 88, pages 138-145, 1988. [7] S. Sakai, Y. Yamaguchi, K. Hiraki , Y. Kodama, and T. Yuba. An...Computer Architecture, 1992. [5] S. Sakai, Y. Yamaguchi, K. Hiraki , Y. Kodama, and T. Yuba. An architecture of a data-flow single chip processor. In Int...EM-4 and sparing time for tech- nical discussions. We also thank Prof. Kei Hiraki at the Univ. of Tokyo for his helpful comments. Hidehiko Masuhara’s
A Survey of Architectural Techniques For Improving Cache Power Efficiency
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers have proposed several techniques for improving energy efficiency of cache architectures. This paper surveys recent architectural techniques for improving cache power efficiency and also presents a classification of these techniques based on their characteristics. For providing an application perspective, this paper also reviews several real-worldmore » processor chips that employ cache energy saving techniques. The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling low-power operation of caches.« less
Design of an Elliptic Curve Cryptography processor for RFID tag chips.
Liu, Zilong; Liu, Dongsheng; Zou, Xuecheng; Lin, Hui; Cheng, Jian
2014-09-26
Radio Frequency Identification (RFID) is an important technique for wireless sensor networks and the Internet of Things. Recently, considerable research has been performed in the combination of public key cryptography and RFID. In this paper, an efficient architecture of Elliptic Curve Cryptography (ECC) Processor for RFID tag chip is presented. We adopt a new inversion algorithm which requires fewer registers to store variables than the traditional schemes. A new method for coordinate swapping is proposed, which can reduce the complexity of the controller and shorten the time of iterative calculation effectively. A modified circular shift register architecture is presented in this paper, which is an effective way to reduce the area of register files. Clock gating and asynchronous counter are exploited to reduce the power consumption. The simulation and synthesis results show that the time needed for one elliptic curve scalar point multiplication over GF(2163) is 176.7 K clock cycles and the gate area is 13.8 K with UMC 0.13 μm Complementary Metal Oxide Semiconductor (CMOS) technology. Moreover, the low power and low cost consumption make the Elliptic Curve Cryptography Processor (ECP) a prospective candidate for application in the RFID tag chip.
Design of an Elliptic Curve Cryptography Processor for RFID Tag Chips
Liu, Zilong; Liu, Dongsheng; Zou, Xuecheng; Lin, Hui; Cheng, Jian
2014-01-01
Radio Frequency Identification (RFID) is an important technique for wireless sensor networks and the Internet of Things. Recently, considerable research has been performed in the combination of public key cryptography and RFID. In this paper, an efficient architecture of Elliptic Curve Cryptography (ECC) Processor for RFID tag chip is presented. We adopt a new inversion algorithm which requires fewer registers to store variables than the traditional schemes. A new method for coordinate swapping is proposed, which can reduce the complexity of the controller and shorten the time of iterative calculation effectively. A modified circular shift register architecture is presented in this paper, which is an effective way to reduce the area of register files. Clock gating and asynchronous counter are exploited to reduce the power consumption. The simulation and synthesis results show that the time needed for one elliptic curve scalar point multiplication over GF(2163) is 176.7 K clock cycles and the gate area is 13.8 K with UMC 0.13 μm Complementary Metal Oxide Semiconductor (CMOS) technology. Moreover, the low power and low cost consumption make the Elliptic Curve Cryptography Processor (ECP) a prospective candidate for application in the RFID tag chip. PMID:25264952
A fully reconfigurable photonic integrated signal processor
NASA Astrophysics Data System (ADS)
Liu, Weilin; Li, Ming; Guzzon, Robert S.; Norberg, Erik J.; Parker, John S.; Lu, Mingzhi; Coldren, Larry A.; Yao, Jianping
2016-03-01
Photonic signal processing has been considered a solution to overcome the inherent electronic speed limitations. Over the past few years, an impressive range of photonic integrated signal processors have been proposed, but they usually offer limited reconfigurability, a feature highly needed for the implementation of large-scale general-purpose photonic signal processors. Here, we report and experimentally demonstrate a fully reconfigurable photonic integrated signal processor based on an InP-InGaAsP material system. The proposed photonic signal processor is capable of performing reconfigurable signal processing functions including temporal integration, temporal differentiation and Hilbert transformation. The reconfigurability is achieved by controlling the injection currents to the active components of the signal processor. Our demonstration suggests great potential for chip-scale fully programmable all-optical signal processing.
NASA Astrophysics Data System (ADS)
Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori
Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations
NASA Astrophysics Data System (ADS)
Hamada, Tsuyoshi; Fukushige, Toshiyuki; Kawai, Atsushi; Makino, Junichiro
2000-10-01
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable multi-purpose computer for many-body simulations. The main difference between PROGRAPE-1 and ``traditional'' GRAPE systems is that the former uses FPGA (Field Programmable Gate Array) chips as the processing elements, while the latter relies on a hardwired pipeline processor specialized to gravitational interactions. Since the logic implemented in FPGA chips can be reconfigured, we can use PROGRAPE-1 to calculate not only gravitational interactions, but also other forms of interactions, such as the van der Waals force, hydro\\-dynamical interactions in the SPHr calculation, and so on. PROGRAPE-1 comprises two Altera EPF10K100 FPGA chips, each of which contains nominally 100000 gates. To evaluate the programmability and performance of PROGRAPE-1, we implemented a pipeline for gravitational interactions similar to that of GRAPE-3. One pipeline is fitted into a single FPGA chip, operated at 16 MHz clock. Thus, for gravitational interactions, PROGRAPE-1 provided a speed of 0.96 Gflops-equivalent. PROGRAPE will prove to be useful for a wide-range of particle-based simulations in which the calculation cost of interactions other than gravity is high, such as the evaluation of SPH interactions.
Finite element computation on nearest neighbor connected machines
NASA Technical Reports Server (NTRS)
Mcaulay, A. D.
1984-01-01
Research aimed at faster, more cost effective parallel machines and algorithms for improving designer productivity with finite element computations is discussed. A set of 8 boards, containing 4 nearest neighbor connected arrays of commercially available floating point chips and substantial memory, are inserted into a commercially available machine. One-tenth Mflop (64 bit operation) processors provide an 89% efficiency when solving the equations arising in a finite element problem for a single variable regular grid of size 40 by 40 by 40. This is approximately 15 to 20 times faster than a much more expensive machine such as a VAX 11/780 used in double precision. The efficiency falls off as faster or more processors are envisaged because communication times become dominant. A novel successive overrelaxation algorithm which uses cyclic reduction in order to permit data transfer and computation to overlap in time is proposed.
Ai, Qi; Chen, Xiao; Tian, Miao; Yan, Bin-bin; Zhang, Ying; Song, Fei-jun; Chen, Gen-xiang; Sang, Xin-zhu; Wang, Yi-quan; Xiao, Feng; Alameh, Kamal
2015-02-01
Based on a digital micromirror device (DMD) processor as the multi-wavelength narrow-band tunable filter, we demonstrate a multi-port tunable fiber laser through experiments. The key property of this laser is that any lasing wavelength channel from any arbitrary output port can be switched independently over the whole C-band, which is only driven by single DMD chip flexibly. All outputs display an excellent tuning capacity and high consistency in the whole C-band with a 0.02 nm linewidth, 0.055 nm wavelength tuning step, and side-mode suppression ratio greater than 60 dB. Due to the automatic power control and polarization design, the power uniformity of output lasers is less than 0.008 dB and the wavelength fluctuation is below 0.02 nm within 2 h at room temperature.
Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture
Guzmán, Pablo; Díaz, Javier; Agís, Rodrigo; Ros, Eduardo
2010-01-01
The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains. PMID:22319283
Method and apparatus to debug an integrated circuit chip via synchronous clock stop and scan
Bellofatto, Ralph E [Ridgefield, CT; Ellavsky, Matthew R [Rochester, MN; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Gooding, Thomas M [Rochester, MN; Haring, Rudolf A [Cortlandt Manor, NY; Hehenberger, Lance G [Leander, TX; Ohmacht, Martin [Yorktown Heights, NY
2012-03-20
An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state. The apparatus and methodology enables construction of a cycle-by-cycle view of any part of the state of a running IC chip, using a combination of on-chip circuitry and software.
Integrated circuit for SAW and MEMS sensors
NASA Astrophysics Data System (ADS)
Fischer, Wolf-Joachim; Koenig, Peter; Ploetner, Matthias; Hermann, Rudiger; Stab, Helmut
2001-11-01
The sensor processor circuit has been developed for hand-held devices used in industrial and environmental applications, such as on-line process monitoring. Thereby devices with SAW sensors or MEMS resonators will benefit from this processor especially. Up to 8 sensors can be connected to the circuit as multisensors or sensor arrays. Two sensor processors SP1 and SP2 for different applications are presented in this paper. The SP-1 chip has a PCMCIA interface which can be used for the program and data transfer. SAW sensors which are working in the frequency range from 80 MHz to 160 MHz can be connected to the processor directly. It is possible to use the new SP-2 chip fabricated in a 0.5(mu) CMOS process for SAW devices with a maximum frequency of 600 MHz. An on-chip analog-digital-converter (ADC) and 6 PWM modules support the development of high-miniaturized intelligent sensor systems We have developed a multi-SAW sensor system with this ASIC that manages the requirements on control as well as signal generation and storage and provides an interface to the PC and electronic devices on the board. Its low power consumption and its PCMCIA plug fulfil the requirements of small size and mobility. For this application sensors have been developed to detect hazardous gases in ambient air. Sensors with differently modified copper-phthalocyanine films are capable of detecting NO2 and O3, whereas those with a hyperbranched polyester film respond to NH3.
Face classification using electronic synapses
NASA Astrophysics Data System (ADS)
Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H.-S. Philip; Qian, He
2017-05-01
Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
Face classification using electronic synapses.
Yao, Peng; Wu, Huaqiang; Gao, Bin; Eryilmaz, Sukru Burc; Huang, Xueyao; Zhang, Wenqiang; Zhang, Qingtian; Deng, Ning; Shi, Luping; Wong, H-S Philip; Qian, He
2017-05-12
Conventional hardware platforms consume huge amount of energy for cognitive learning due to the data movement between the processor and the off-chip memory. Brain-inspired device technologies using analogue weight storage allow to complete cognitive tasks more efficiently. Here we present an analogue non-volatile resistive memory (an electronic synapse) with foundry friendly materials. The device shows bidirectional continuous weight modulation behaviour. Grey-scale face classification is experimentally demonstrated using an integrated 1024-cell array with parallel online training. The energy consumption within the analogue synapses for each iteration is 1,000 × (20 ×) lower compared to an implementation using Intel Xeon Phi processor with off-chip memory (with hypothetical on-chip digital resistive random access memory). The accuracy on test sets is close to the result using a central processing unit. These experimental results consolidate the feasibility of analogue synaptic array and pave the way toward building an energy efficient and large-scale neuromorphic system.
Prototyping the HPDP Chip on STM 65 NM Process
NASA Astrophysics Data System (ADS)
Papadas, C.; Dramitinos, G.; Syed, M.; Helfers, T.; Dedes, G.; Schoellkopf, J.-P.; Dugoujon, L.
2011-08-01
Currently Astrium GmbH is involved in the of the High Performance Data Processor (HPDP) development programme for telecommunication applications under a DLR contract. The HPDP project targets the implementation of the commercially available reconfigurable array processor IP (XPP from the company PACT XPP Technologies) in a radiation hardened technology.In the current complementary development phase funded under the Greek Industry Incentive scheme, it is planned to prototype the HPDP chip in commercial STM 65 nm technology. In addition it is also planned to utilise the preliminary radiation hardened components of this library wherever possible.This abstract gives an overview of the HPDP chip architecture, the basic details of the STM 65 nm process and the design flow foreseen for the prototyping. The paper will discuss the development and integration issues involved in using the STM 65 nm process (also including the available preliminary radiation hardened components) for designs targeted to be used in space applications.
NASA Astrophysics Data System (ADS)
González, Diego; Botella, Guillermo; García, Carlos; Prieto, Manuel; Tirado, Francisco
2013-12-01
This contribution focuses on the optimization of matching-based motion estimation algorithms widely used for video coding standards using an Altera custom instruction-based paradigm and a combination of synchronous dynamic random access memory (SDRAM) with on-chip memory in Nios II processors. A complete profile of the algorithms is achieved before the optimization, which locates code leaks, and afterward, creates a custom instruction set, which is then added to the specific design, enhancing the original system. As well, every possible memory combination between on-chip memory and SDRAM has been tested to achieve the best performance. The final throughput of the complete designs are shown. This manuscript outlines a low-cost system, mapped using very large scale integration technology, which accelerates software algorithms by converting them into custom hardware logic blocks and showing the best combination between on-chip memory and SDRAM for the Nios II processor.
A Streaming Language Implementation of the Discontinuous Galerkin Method
NASA Technical Reports Server (NTRS)
Barth, Timothy; Knight, Timothy
2005-01-01
We present a Brook streaming language implementation of the 3-D discontinuous Galerkin method for compressible fluid flow on tetrahedral meshes. Efficient implementation of the discontinuous Galerkin method using the streaming model of computation introduces several algorithmic design challenges. Using a cycle-accurate simulator, performance characteristics have been obtained for the Stanford Merrimac stream processor. The current Merrimac design achieves 128 Gflops per chip and the desktop board is populated with 16 chips yielding a peak performance of 2 Teraflops. Total parts cost for the desktop board is less than $20K. Current cycle-accurate simulations for discretizations of the 3-D compressible flow equations yield approximately 40-50% of the peak performance of the Merrimac streaming processor chip. Ongoing work includes the assessment of the performance of the same algorithm on the 2 Teraflop desktop board with a target goal of achieving 1 Teraflop performance.
Efficiency of static core turn-off in a system-on-a-chip with variation
Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong
2013-10-29
A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.
Examining the role of tuber biochemistry in the development of zebra chip in stored potato tubers
USDA-ARS?s Scientific Manuscript database
Zebra chip disease (ZC), associated with infection by the bacterium ‘Candidatus Liberibacter solanacearum’ (Lso), is an emerging problem for potato growers in the United States, Mexico, and New Zealand. Although potato tubers exhibiting ZC symptoms will be rejected by processors, it remains possible...
Associative architecture for image processing
NASA Astrophysics Data System (ADS)
Adar, Rutie; Akerib, Avidan
1997-09-01
This article presents a new generation in parallel processing architecture for real-time image processing. The approach is implemented in a real time image processor chip, called the XiumTM-2, based on combining a fully associative array which provides the parallel engine with a serial RISC core on the same die. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and media processing functions in real time. The associative part of the chip is based on patented pending methodology of Associative Computing Ltd. (ACL), which condenses 2048 associative processors, each of 128 'intelligent' bits. Each bit can be a processing bit or a memory bit. At only 33 MHz and 0.6 micron manufacturing technology process, the chip has a computational power of 3 billion ALU operations per second and 66 billion string search operations per second. The fully programmable nature of the XiumTM-2 chip enables developers to use ACL tools to write their own proprietary algorithms combined with existing image processing and analysis functions from ACL's extended set of libraries.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool
NASA Astrophysics Data System (ADS)
Barr, David R. W.; Dudek, Piotr
2009-12-01
We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
CoNNeCT Baseband Processor Module
NASA Technical Reports Server (NTRS)
Yamamoto, Clifford K; Jedrey, Thomas C.; Gutrich, Daniel G.; Goodpasture, Richard L.
2011-01-01
A document describes the CoNNeCT Baseband Processor Module (BPM) based on an updated processor, memory technology, and field-programmable gate arrays (FPGAs). The BPM was developed from a requirement to provide sufficient computing power and memory storage to conduct experiments for a Software Defined Radio (SDR) to be implemented. The flight SDR uses the AT697 SPARC processor with on-chip data and instruction cache. The non-volatile memory has been increased from a 20-Mbit EEPROM (electrically erasable programmable read only memory) to a 4-Gbit Flash, managed by the RTAX2000 Housekeeper, allowing more programs and FPGA bit-files to be stored. The volatile memory has been increased from a 20-Mbit SRAM (static random access memory) to a 1.25-Gbit SDRAM (synchronous dynamic random access memory), providing additional memory space for more complex operating systems and programs to be executed on the SPARC. All memory is EDAC (error detection and correction) protected, while the SPARC processor implements fault protection via TMR (triple modular redundancy) architecture. Further capability over prior BPM designs includes the addition of a second FPGA to implement features beyond the resources of a single FPGA. Both FPGAs are implemented with Xilinx Virtex-II and are interconnected by a 96-bit bus to facilitate data exchange. Dedicated 1.25- Gbit SDRAMs are wired to each Xilinx FPGA to accommodate high rate data buffering for SDR applications as well as independent SpaceWire interfaces. The RTAX2000 manages scrub and configuration of each Xilinx.
Intelligent subsystem interface for modular hardware system
NASA Technical Reports Server (NTRS)
Caffrey, Robert T. (Inventor); Krening, Douglas N. (Inventor); Lannan, Gregory B. (Inventor); Schneiderwind, Michael J. (Inventor); Schneiderwind, Robert A. (Inventor)
2000-01-01
A single chip application specific integrated circuit (ASIC) which provides a flexible, modular interface between a subsystem and a standard system bus. The ASIC includes a microcontroller/microprocessor, a serial interface for connection to the bus, and a variety of communications interface devices available for coupling to the subsystem. A three-bus architecture, utilizing arbitration, provides connectivity within the ASIC and between the ASIC and the subsystem. The communication interface devices include UART (serial), parallel, analog, and external device interface utilizing bus connections paired with device select signals. A low power (sleep) mode is provided as is a processor disable option.
Smart image sensors: an emerging key technology for advanced optical measurement and microsystems
NASA Astrophysics Data System (ADS)
Seitz, Peter
1996-08-01
Optical microsystems typically include photosensitive devices, analog preprocessing circuitry and digital signal processing electronics. The advances in semiconductor technology have made it possible today to integrate all photosensitive and electronical devices on one 'smart image sensor' or photo-ASIC (application-specific integrated circuits containing photosensitive elements). It is even possible to provide each 'smart pixel' with additional photoelectronic functionality, without compromising the fill factor substantially. This technological capability is the basis for advanced cameras and optical microsystems showing novel on-chip functionality: Single-chip cameras with on- chip analog-to-digital converters for less than $10 are advertised; image sensors have been developed including novel functionality such as real-time selectable pixel size and shape, the capability of performing arbitrary convolutions simultaneously with the exposure, as well as variable, programmable offset and sensitivity of the pixels leading to image sensors with a dynamic range exceeding 150 dB. Smart image sensors have been demonstrated offering synchronous detection and demodulation capabilities in each pixel (lock-in CCD), and conventional image sensors are combined with an on-chip digital processor for complete, single-chip image acquisition and processing systems. Technological problems of the monolithic integration of smart image sensors include offset non-uniformities, temperature variations of electronic properties, imperfect matching of circuit parameters, etc. These problems can often be overcome either by designing additional compensation circuitry or by providing digital correction routines. Where necessary for technological or economic reasons, smart image sensors can also be combined with or realized as hybrids, making use of commercially available electronic components. It is concluded that the possibilities offered by custom smart image sensors will influence the design and the performance of future electronic imaging systems in many disciplines, reaching from optical metrology to machine vision on the factory floor and in robotics applications.
Integrating photonics with silicon nanoelectronics for the next generation of systems on a chip.
Atabaki, Amir H; Moazeni, Sajjad; Pavanello, Fabio; Gevorgyan, Hayk; Notaros, Jelena; Alloatti, Luca; Wade, Mark T; Sun, Chen; Kruger, Seth A; Meng, Huaiyu; Al Qubaisi, Kenaish; Wang, Imbert; Zhang, Bohan; Khilo, Anatol; Baiocco, Christopher V; Popović, Miloš A; Stojanović, Vladimir M; Ram, Rajeev J
2018-04-01
Electronic and photonic technologies have transformed our lives-from computing and mobile devices, to information technology and the internet. Our future demands in these fields require innovation in each technology separately, but also depend on our ability to harness their complementary physics through integrated solutions 1,2 . This goal is hindered by the fact that most silicon nanotechnologies-which enable our processors, computer memory, communications chips and image sensors-rely on bulk silicon substrates, a cost-effective solution with an abundant supply chain, but with substantial limitations for the integration of photonic functions. Here we introduce photonics into bulk silicon complementary metal-oxide-semiconductor (CMOS) chips using a layer of polycrystalline silicon deposited on silicon oxide (glass) islands fabricated alongside transistors. We use this single deposited layer to realize optical waveguides and resonators, high-speed optical modulators and sensitive avalanche photodetectors. We integrated this photonic platform with a 65-nanometre-transistor bulk CMOS process technology inside a 300-millimetre-diameter-wafer microelectronics foundry. We then implemented integrated high-speed optical transceivers in this platform that operate at ten gigabits per second, composed of millions of transistors, and arrayed on a single optical bus for wavelength division multiplexing, to address the demand for high-bandwidth optical interconnects in data centres and high-performance computing 3,4 . By decoupling the formation of photonic devices from that of transistors, this integration approach can achieve many of the goals of multi-chip solutions 5 , but with the performance, complexity and scalability of 'systems on a chip' 1,6-8 . As transistors smaller than ten nanometres across become commercially available 9 , and as new nanotechnologies emerge 10,11 , this approach could provide a way to integrate photonics with state-of-the-art nanoelectronics.
Implementation of MPEG-2 encoder to multiprocessor system using multiple MVPs (TMS320C80)
NASA Astrophysics Data System (ADS)
Kim, HyungSun; Boo, Kenny; Chung, SeokWoo; Choi, Geon Y.; Lee, YongJin; Jeon, JaeHo; Park, Hyun Wook
1997-05-01
This paper presents the efficient algorithm mapping for the real-time MPEG-2 encoding on the KAIST image computing system (KICS), which has a parallel architecture using five multimedia video processors (MVPs). The MVP is a general purpose digital signal processor (DSP) of Texas Instrument. It combines one floating-point processor and four fixed- point DSPs on a single chip. The KICS uses the MVP as a primary processing element (PE). Two PEs form a cluster, and there are two processing clusters in the KICS. Real-time MPEG-2 encoder is implemented through the spatial and the functional partitioning strategies. Encoding process of spatially partitioned half of the video input frame is assigned to ne processing cluster. Two PEs perform the functionally partitioned MPEG-2 encoding tasks in the pipelined operation mode. One PE of a cluster carries out the transform coding part and the other performs the predictive coding part of the MPEG-2 encoding algorithm. One MVP among five MVPs is used for system control and interface with host computer. This paper introduces an implementation of the MPEG-2 algorithm with a parallel processing architecture.
Reconfigurable Very Long Instruction Word (VLIW) Processor
NASA Technical Reports Server (NTRS)
Velev, Miroslav N.
2015-01-01
Future NASA missions will depend on radiation-hardened, power-efficient processing systems-on-a-chip (SOCs) that consist of a range of processor cores custom tailored for space applications. Aries Design Automation, LLC, has developed a processing SOC that is optimized for software-defined radio (SDR) uses. The innovation implements the Institute of Electrical and Electronics Engineers (IEEE) RazorII voltage management technique, a microarchitectural mechanism that allows processor cores to self-monitor, self-analyze, and selfheal after timing errors, regardless of their cause (e.g., radiation; chip aging; variations in the voltage, frequency, temperature, or manufacturing process). This highly automated SOC can also execute legacy PowerPC 750 binary code instruction set architecture (ISA), which is used in the flight-control computers of many previous NASA space missions. In developing this innovation, Aries Design Automation has made significant contributions to the fields of formal verification of complex pipelined microprocessors and Boolean satisfiability (SAT) and has developed highly efficient electronic design automation tools that hold promise for future developments.
Conditional Dispersive Readout of a CMOS Single-Electron Memory Cell
NASA Astrophysics Data System (ADS)
Schaal, S.; Barraud, S.; Morton, J. J. L.; Gonzalez-Zalba, M. F.
2018-05-01
Quantum computers require interfaces with classical electronics for efficient qubit control, measurement, and fast data processing. Fabricating the qubit and the classical control layer using the same technology is appealing because it will facilitate the integration process, improving feedback speeds and offering potential solutions to wiring and layout challenges. Integrating classical and quantum devices monolithically, using complementary metal-oxide-semiconductor (CMOS) processes, enables the processor to profit from the most mature industrial technology for the fabrication of large-scale circuits. We demonstrate a CMOS single-electron memory cell composed of a single quantum dot and a transistor that locks charge on the quantum-dot gate. The single-electron memory cell is conditionally read out by gate-based dispersive sensing using a lumped-element L C resonator. The control field-effect transistor (FET) and quantum dot are fabricated on the same chip using fully depleted silicon-on-insulator technology. We obtain a charge sensitivity of δ q =95 ×10-6e Hz-1 /2 when the quantum-dot readout is enabled by the control FET, comparable to results without the control FET. Additionally, we observe a single-electron retention time on the order of a second when storing a single-electron charge on the quantum dot at millikelvin temperatures. These results demonstrate first steps towards time-based multiplexing of gate-based dispersive readout in CMOS quantum devices opening the path for the development of an all-silicon quantum-classical processor.
Bio-inspired optical rotation sensor
NASA Astrophysics Data System (ADS)
O'Carroll, David C.; Shoemaker, Patrick A.; Brinkworth, Russell S. A.
2007-01-01
Traditional approaches to calculating self-motion from visual information in artificial devices have generally relied on object identification and/or correlation of image sections between successive frames. Such calculations are computationally expensive and real-time digital implementation requires powerful processors. In contrast flies arrive at essentially the same outcome, the estimation of self-motion, in a much smaller package using vastly less power. Despite the potential advantages and a few notable successes, few neuromorphic analog VLSI devices based on biological vision have been employed in practical applications to date. This paper describes a hardware implementation in aVLSI of our recently developed adaptive model for motion detection. The chip integrates motion over a linear array of local motion processors to give a single voltage output. Although the device lacks on-chip photodetectors, it includes bias circuits to use currents from external photodiodes, and we have integrated it with a ring-array of 40 photodiodes to form a visual rotation sensor. The ring configuration reduces pattern noise and combined with the pixel-wise adaptive characteristic of the underlying circuitry, permits a robust output that is proportional to image rotational velocity over a large range of speeds, and is largely independent of either mean luminance or the spatial structure of the image viewed. In principle, such devices could be used as an element of a velocity-based servo to replace or augment inertial guidance systems in applications such as mUAVs.
Systolic array IC for genetic computation
NASA Technical Reports Server (NTRS)
Anderson, D.
1991-01-01
Measuring similarities between large sequences of genetic information is a formidable task requiring enormous amounts of computer time. Geneticists claim that nearly two months of CRAY-2 time are required to run a single comparison of the known database against the new bases that will be found this year, and more than a CRAY-2 year for next year's genetic discoveries, and so on. The DNA IC, designed at HP-ICBD in cooperation with the California Institute of Technology and the Jet Propulsion Laboratory, is being implemented in order to move the task of genetic comparison onto workstations and personal computers, while vastly improving performance. The chip is a systolic (pumped) array comprised of 16 processors, control logic, and global RAM, totaling 400,000 FETS. At 12 MHz, each chip performs 2.7 billion 16 bit operations per second. Using 35 of these chips in series on one PC board (performing nearly 100 billion operations per second), a sequence of 560 bases can be compared against the eventual total genome of 3 billion bases, in minutes--on a personal computer. While the designed purpose of the DNA chip is for genetic research, other disciplines requiring similarity measurements between strings of 7 bit encoded data could make use of this chip as well. Cryptography and speech recognition are two examples. A mix of full custom design and standard cells, in CMOS34, were used to achieve these goals. Innovative test methods were developed to enhance controllability and observability in the array. This paper describes these techniques as well as the chip's functionality. This chip was designed in the 1989-90 timeframe.
Software-defined reconfigurable microwave photonics processor.
Pérez, Daniel; Gasulla, Ivana; Capmany, José
2015-06-01
We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.
Reconfigurable lattice mesh designs for programmable photonic processors.
Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A
2016-05-30
We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.
1989-01-20
addressable memory can be loaded or off- loaded as the number crunching continues. Modem VLSI processors can often process data faster than today’s...Available DSP Chips Texas Instruments was one of the first serious manufacturers of DSP chips. With the Texas Instruments TMS310 DSP chip, modem , voice...Can handle double presicion data types. Texas Instruments TMS32010 T’s first-generation DSP design: a fixed-point DSP that has found its way into modem
GR712RC- Dual-Core Processor- Product Status
NASA Astrophysics Data System (ADS)
Sturesson, Fredrik; Habinc, Sandi; Gaisler, Jiri
2012-08-01
The GR712RC System-on-Chip (SoC) is a dual core LEON3FT system suitable for advanced high reliability space avionics. Fault tolerance features from Aeroflex Gaisler’s GRLIB IP library and an implementation using Ramon Chips RadSafe cell library enables superior radiation hardness.The GR712RC device has been designed to provide high processing power by including two LEON3FT 32- bit SPARC V8 processors, each with its own high- performance IEEE754 compliant floating-point-unit and SPARC reference memory management unit.This high processing power is combined with a large number of serial interfaces, ranging from high-speed links for data transfers to low-speed control buses for commanding and status acquisition.
NASA Astrophysics Data System (ADS)
Sanford, James L.; Schlig, Eugene S.; Prache, Olivier; Dove, Derek B.; Ali, Tariq A.; Howard, Webster E.
2002-02-01
The IBM Research Division and eMagin Corp. jointly have developed a low-power VGA direct view active matrix OLED display, fabricated on a crystalline silicon CMOS chip. The display is incorporated in IBM prototype wristwatch computers running the Linus operating system. IBM designed the silicon chip and eMagin developed the organic stack and performed the back-end-of line processing and packaging. Each pixel is driven by a constant current source controlled by a CMOS RAM cell, and the display receives its data from the processor memory bus. This paper describes the OLED technology and packaging, and outlines the design of the pixel and display electronics and the processor interface. Experimental results are presented.
On-chip programmable ultra-wideband microwave photonic phase shifter and true time delay unit.
Burla, Maurizio; Cortés, Luis Romero; Li, Ming; Wang, Xu; Chrostowski, Lukas; Azaña, José
2014-11-01
We proposed and experimentally demonstrated an ultra-broadband on-chip microwave photonic processor that can operate both as RF phase shifter (PS) and true-time-delay (TTD) line, with continuous tuning. The processor is based on a silicon dual-phase-shifted waveguide Bragg grating (DPS-WBG) realized with a CMOS compatible process. We experimentally demonstrated the generation of delay up to 19.4 ps over 10 GHz instantaneous bandwidth and a phase shift of approximately 160° over the bandwidth 22-29 GHz. The available RF measurement setup ultimately limits the phase shifting demonstration as the device is capable of providing up to 300° phase shift for RF frequencies over a record bandwidth approaching 1 THz.
Geospace simulations using modern accelerator processor technology
NASA Astrophysics Data System (ADS)
Germaschewski, K.; Raeder, J.; Larson, D. J.
2009-12-01
OpenGGCM (Open Geospace General Circulation Model) is a well-established numerical code simulating the Earth's space environment. The most computing intensive part is the MHD (magnetohydrodynamics) solver that models the plasma surrounding Earth and its interaction with Earth's magnetic field and the solar wind flowing in from the sun. Like other global magnetosphere codes, OpenGGCM's realism is currently limited by computational constraints on grid resolution. OpenGGCM has been ported to make use of the added computational powerof modern accelerator based processor architectures, in particular the Cell processor. The Cell architecture is a novel inhomogeneous multicore architecture capable of achieving up to 230 GFLops on a single chip. The University of New Hampshire recently acquired a PowerXCell 8i based computing cluster, and here we will report initial performance results of OpenGGCM. Realizing the high theoretical performance of the Cell processor is a programming challenge, though. We implemented the MHD solver using a multi-level parallelization approach: On the coarsest level, the problem is distributed to processors based upon the usual domain decomposition approach. Then, on each processor, the problem is divided into 3D columns, each of which is handled by the memory limited SPEs (synergistic processing elements) slice by slice. Finally, SIMD instructions are used to fully exploit the SIMD FPUs in each SPE. Memory management needs to be handled explicitly by the code, using DMA to move data from main memory to the per-SPE local store and vice versa. We use a modern technique, automatic code generation, which shields the application programmer from having to deal with all of the implementation details just described, keeping the code much more easily maintainable. Our preliminary results indicate excellent performance, a speed-up of a factor of 30 compared to the unoptimized version.
Fault detection and bypass in a sequence information signal processor
NASA Technical Reports Server (NTRS)
Peterson, John C. (Inventor); Chow, Edward T. (Inventor)
1992-01-01
The invention comprises a plurality of scan registers, each such register respectively associated with a processor element; an on-chip comparator, encoder and fault bypass register. Each scan register generates a unitary signal the logic state of which depends on the correctness of the input from the previous processor in the systolic array. These unitary signals are input to a common comparator which generates an output indicating whether or not an error has occurred. These unitary signals are also input to an encoder which identifies the location of any fault detected so that an appropriate multiplexer can be switched to bypass the faulty processor element. Input scan data can be readily programmed to fully exercise all of the processor elements so that no fault can remain undetected.
A wideband software reconfigurable modem
NASA Astrophysics Data System (ADS)
Turner, J. H., Jr.; Vickers, H.
A wideband modem is described which provides signal processing capability for four Lx-band signals employing QPSK, MSK and PPM waveforms and employs a software reconfigurable architecture for maximum system flexibility and graceful degradation. The current processor uses a 2901 and two 8086 microprocessors per channel and performs acquisition, tracking, and data demodulation for JITDS, GPS, IFF and TACAN systems. The next generation processor will be implemented using a VHSIC chip set employing a programmable complex array vector processor module, a GP computer module, customized gate array modules, and a digital array correlator. This integrated processor has application to a wide number of diverse system waveforms, and will bring the benefits of VHSIC technology insertion into avionic antijam communications systems.
Development Of A Three-Dimensional Circuit Integration Technology And Computer Architecture
NASA Astrophysics Data System (ADS)
Etchells, R. D.; Grinberg, J.; Nudd, G. R.
1981-12-01
This paper is the first of a series 1,2,3 describing a range of efforts at Hughes Research Laboratories, which are collectively referred to as "Three-Dimensional Microelectronics." The technology being developed is a combination of a unique circuit fabrication/packaging technology and a novel processing architecture. The packaging technology greatly reduces the parasitic impedances associated with signal-routing in complex VLSI structures, while simultaneously allowing circuit densities orders of magnitude higher than the current state-of-the-art. When combined with the 3-D processor architecture, the resulting machine exhibits a one- to two-order of magnitude simultaneous improvement over current state-of-the-art machines in the three areas of processing speed, power consumption, and physical volume. The 3-D architecture is essentially that commonly referred to as a "cellular array", with the ultimate implementation having as many as 512 x 512 processors working in parallel. The three-dimensional nature of the assembled machine arises from the fact that the chips containing the active circuitry of the processor are stacked on top of each other. In this structure, electrical signals are passed vertically through the chips via thermomigrated aluminum feedthroughs. Signals are passed between adjacent chips by micro-interconnects. This discussion presents a broad view of the total effort, as well as a more detailed treatment of the fabrication and packaging technologies themselves. The results of performance simulations of the completed 3-D processor executing a variety of algorithms are also presented. Of particular pertinence to the interests of the focal-plane array community is the simulation of the UNICORNS nonuniformity correction algorithms as executed by the 3-D architecture.
NASA Space Engineering Research Center for VLSI systems design
NASA Technical Reports Server (NTRS)
1991-01-01
This annual review reports the center's activities and findings on very large scale integration (VLSI) systems design for 1990, including project status, financial support, publications, the NASA Space Engineering Research Center (SERC) Symposium on VLSI Design, research results, and outreach programs. Processor chips completed or under development are listed. Research results summarized include a design technique to harden complementary metal oxide semiconductors (CMOS) memory circuits against single event upset (SEU); improved circuit design procedures; and advances in computer aided design (CAD), communications, computer architectures, and reliability design. Also described is a high school teacher program that exposes teachers to the fundamentals of digital logic design.
3D integrated superconducting qubits
NASA Astrophysics Data System (ADS)
Rosenberg, D.; Kim, D.; Das, R.; Yost, D.; Gustavsson, S.; Hover, D.; Krantz, P.; Melville, A.; Racz, L.; Samach, G. O.; Weber, S. J.; Yan, F.; Yoder, J. L.; Kerman, A. J.; Oliver, W. D.
2017-10-01
As the field of quantum computing advances from the few-qubit stage to larger-scale processors, qubit addressability and extensibility will necessitate the use of 3D integration and packaging. While 3D integration is well-developed for commercial electronics, relatively little work has been performed to determine its compatibility with high-coherence solid-state qubits. Of particular concern, qubit coherence times can be suppressed by the requisite processing steps and close proximity of another chip. In this work, we use a flip-chip process to bond a chip with superconducting flux qubits to another chip containing structures for qubit readout and control. We demonstrate that high qubit coherence (T1, T2,echo > 20 μs) is maintained in a flip-chip geometry in the presence of galvanic, capacitive, and inductive coupling between the chips.
Next-generation digital camera integration and software development issues
NASA Astrophysics Data System (ADS)
Venkataraman, Shyam; Peters, Ken; Hecht, Richard
1998-04-01
This paper investigates the complexities associated with the development of next generation digital cameras due to requirements in connectivity and interoperability. Each successive generation of digital camera improves drastically in cost, performance, resolution, image quality and interoperability features. This is being accomplished by advancements in a number of areas: research, silicon, standards, etc. As the capabilities of these cameras increase, so do the requirements for both hardware and software. Today, there are two single chip camera solutions in the market including the Motorola MPC 823 and LSI DCAM- 101. Real time constraints for a digital camera may be defined by the maximum time allowable between capture of images. Constraints in the design of an embedded digital camera include processor architecture, memory, processing speed and the real-time operating systems. This paper will present the LSI DCAM-101, a single-chip digital camera solution. It will present an overview of the architecture and the challenges in hardware and software for supporting streaming video in such a complex device. Issues presented include the development of the data flow software architecture, testing and integration on this complex silicon device. The strategy for optimizing performance on the architecture will also be presented.
An ultra-low-power pulse oximeter implemented with an energy-efficient transimpedance amplifier.
Tavakoli, M; Turicchia, L; Sarpeshkar, R
2010-02-01
Pulse oximeters are ubiquitous in modern medicine to noninvasively measure the percentage of oxygenated hemoglobin in a patient's blood by comparing the transmission characteristics of red and infrared light-emitting diode light through the patient's finger with a photoreceptor. We present an analog single-chip pulse oximeter with 4.8-mW total power dissipation, which is an order of magnitude below our measurements on commercial implementations. The majority of this power reduction is due to the use of a novel logarithmic transimpedance amplifier with inherent contrast sensitivity, distributed amplification, unilateralization, and automatic loop gain control. The transimpedance amplifier, together with a photodiode current source, form a high-performance photoreceptor with characteristics similar to those found in nature, which allows LED power to be reduced. Therefore, our oximeter is well suited for portable medical applications, such as continuous home-care monitoring for elderly or chronic patients, emergency patient transport, remote soldier monitoring, and wireless medical sensing. Furthermore, our design obviates the need for an A-to-D and digital signal processor and leads to a small single-chip solution. We outline how extensions of our work could lead to submilliwatt oximeters.
The development of a general purpose ARM-based processing unit for the ATLAS TileCal sROD
NASA Astrophysics Data System (ADS)
Cox, M. A.; Reed, R.; Mellado, B.
2015-01-01
After Phase-II upgrades in 2022, the data output from the LHC ATLAS Tile Calorimeter will increase significantly. ARM processors are common in mobile devices due to their low cost, low energy consumption and high performance. It is proposed that a cost-effective, high data throughput Processing Unit (PU) can be developed by using several consumer ARM processors in a cluster configuration to allow aggregated processing performance and data throughput while maintaining minimal software design difficulty for the end-user. This PU could be used for a variety of high-level functions on the high-throughput raw data such as spectral analysis and histograms to detect possible issues in the detector at a low level. High-throughput I/O interfaces are not typical in consumer ARM System on Chips but high data throughput capabilities are feasible via the novel use of PCI-Express as the I/O interface to the ARM processors. An overview of the PU is given and the results for performance and throughput testing of four different ARM Cortex System on Chips are presented.
Novel processor architecture for onboard infrared sensors
NASA Astrophysics Data System (ADS)
Hihara, Hiroki; Iwasaki, Akira; Tamagawa, Nobuo; Kuribayashi, Mitsunobu; Hashimoto, Masanori; Mitsuyama, Yukio; Ochi, Hiroyuki; Onodera, Hidetoshi; Kanbara, Hiroyuki; Wakabayashi, Kazutoshi; Tada, Munehiro
2016-09-01
Infrared sensor system is a major concern for inter-planetary missions that investigate the nature and the formation processes of planets and asteroids. The infrared sensor system requires signal preprocessing functions that compensate for the intensity of infrared image sensors to get high quality data and high compression ratio through the limited capacity of transmission channels towards ground stations. For those implementations, combinations of Field Programmable Gate Arrays (FPGAs) and microprocessors are employed by AKATSUKI, the Venus Climate Orbiter, and HAYABUSA2, the asteroid probe. On the other hand, much smaller size and lower power consumption are demanded for future missions to accommodate more sensors. To fulfill this future demand, we developed a novel processor architecture which consists of reconfigurable cluster cores and programmable-logic cells with complementary atom switches. The complementary atom switches enable hardware programming without configuration memories, and thus soft-error on logic circuit connection is completely eliminated. This is a noteworthy advantage for space applications which cannot be found in conventional re-writable FPGAs. Almost one-tenth of lower power consumption is expected compared to conventional re-writable FPGAs because of the elimination of configuration memories. The proposed processor architecture can be reconfigured by behavioral synthesis with higher level language specification. Consequently, compensation functions are implemented in a single chip without accommodating program memories, which is accompanied with conventional microprocessors, while maintaining the comparable performance. This enables us to embed a processor element on each infrared signal detector output channel.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Real-time encoding and compression of neuronal spikes by metal-oxide memristors
NASA Astrophysics Data System (ADS)
Gupta, Isha; Serb, Alexantrou; Khiat, Ali; Zeitler, Ralf; Vassanelli, Stefano; Prodromakis, Themistoklis
2016-09-01
Advanced brain-chip interfaces with numerous recording sites bear great potential for investigation of neuroprosthetic applications. The bottleneck towards achieving an efficient bio-electronic link is the real-time processing of neuronal signals, which imposes excessive requirements on bandwidth, energy and computation capacity. Here we present a unique concept where the intrinsic properties of memristive devices are exploited to compress information on neural spikes in real-time. We demonstrate that the inherent voltage thresholds of metal-oxide memristors can be used for discriminating recorded spiking events from background activity and without resorting to computationally heavy off-line processing. We prove that information on spike amplitude and frequency can be transduced and stored in single devices as non-volatile resistive state transitions. Finally, we show that a memristive device array allows for efficient data compression of signals recorded by a multi-electrode array, demonstrating the technology's potential for building scalable, yet energy-efficient on-node processors for brain-chip interfaces.
SVM classifier on chip for melanoma detection.
Afifi, Shereen; GholamHosseini, Hamid; Sinha, Roopak
2017-07-01
Support Vector Machine (SVM) is a common classifier used for efficient classification with high accuracy. SVM shows high accuracy for classifying melanoma (skin cancer) clinical images within computer-aided diagnosis systems used by skin cancer specialists to detect melanoma early and save lives. We aim to develop a medical low-cost handheld device that runs a real-time embedded SVM-based diagnosis system for use in primary care for early detection of melanoma. In this paper, an optimized SVM classifier is implemented onto a recent FPGA platform using the latest design methodology to be embedded into the proposed device for realizing online efficient melanoma detection on a single system on chip/device. The hardware implementation results demonstrate a high classification accuracy of 97.9% and a significant acceleration factor of 26 from equivalent software implementation on an embedded processor, with 34% of resources utilization and 2 watts for power consumption. Consequently, the implemented system meets crucial embedded systems constraints of high performance and low cost, resources utilization and power consumption, while achieving high classification accuracy.
Real-time encoding and compression of neuronal spikes by metal-oxide memristors
Gupta, Isha; Serb, Alexantrou; Khiat, Ali; Zeitler, Ralf; Vassanelli, Stefano; Prodromakis, Themistoklis
2016-01-01
Advanced brain-chip interfaces with numerous recording sites bear great potential for investigation of neuroprosthetic applications. The bottleneck towards achieving an efficient bio-electronic link is the real-time processing of neuronal signals, which imposes excessive requirements on bandwidth, energy and computation capacity. Here we present a unique concept where the intrinsic properties of memristive devices are exploited to compress information on neural spikes in real-time. We demonstrate that the inherent voltage thresholds of metal-oxide memristors can be used for discriminating recorded spiking events from background activity and without resorting to computationally heavy off-line processing. We prove that information on spike amplitude and frequency can be transduced and stored in single devices as non-volatile resistive state transitions. Finally, we show that a memristive device array allows for efficient data compression of signals recorded by a multi-electrode array, demonstrating the technology's potential for building scalable, yet energy-efficient on-node processors for brain-chip interfaces. PMID:27666698
Next Generation Space Telescope Integrated Science Module Data System
NASA Technical Reports Server (NTRS)
Schnurr, Richard G.; Greenhouse, Matthew A.; Jurotich, Matthew M.; Whitley, Raymond; Kalinowski, Keith J.; Love, Bruce W.; Travis, Jeffrey W.; Long, Knox S.
1999-01-01
The Data system for the Next Generation Space Telescope (NGST) Integrated Science Module (ISIM) is the primary data interface between the spacecraft, telescope, and science instrument systems. This poster includes block diagrams of the ISIM data system and its components derived during the pre-phase A Yardstick feasibility study. The poster details the hardware and software components used to acquire and process science data for the Yardstick instrument compliment, and depicts the baseline external interfaces to science instruments and other systems. This baseline data system is a fully redundant, high performance computing system. Each redundant computer contains three 150 MHz power PC processors. All processors execute a commercially available real time multi-tasking operating system supporting, preemptive multi-tasking, file management and network interfaces. These six processors in the system are networked together. The spacecraft interface baseline is an extension of the network, which links the six processors. The final selection for Processor busses, processor chips, network interfaces, and high-speed data interfaces will be made during mid 2002.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Yao; Balaprakash, Prasanna; Meng, Jiayuan
We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list ofmore » architectural scaling options for future processor designs.« less
Spacecraft computer technology at Southwest Research Institute
NASA Technical Reports Server (NTRS)
Shirley, D. J.
1993-01-01
Southwest Research Institute (SwRI) has developed and delivered spacecraft computers for a number of different near-Earth-orbit spacecraft including shuttle experiments and SDIO free-flyer experiments. We describe the evolution of the basic SwRI spacecraft computer design from those weighing in at 20 to 25 lb and using 20 to 30 W to newer models weighing less than 5 lb and using only about 5 W, yet delivering twice the processing throughput. Because of their reduced size, weight, and power, these newer designs are especially applicable to planetary instrument requirements. The basis of our design evolution has been the availability of more powerful processor chip sets and the development of higher density packaging technology, coupled with more aggressive design strategies in incorporating high-density FPGA technology and use of high-density memory chips. In addition to reductions in size, weight, and power, the newer designs also address the necessity of survival in the harsh radiation environment of space. Spurred by participation in such programs as MSTI, LACE, RME, Delta 181, Delta Star, and RADARSAT, our designs have evolved in response to program demands to be small, low-powered units, radiation tolerant enough to be suitable for both Earth-orbit microsats and for planetary instruments. Present designs already include MIL-STD-1750 and Multi-Chip Module (MCM) technology with near-term plans to include RISC processors and higher-density MCM's. Long term plans include development of whole-core processors on one or two MCM's.
Magnetic Bubble Memories for Data Collection in Sounding Rockets,
1982-01-29
generate interest in bubbles as a mass storage device for micro - processor based equipment, manufacturers have come up with a variety of diversified...absence of a bubble represents a Ŕ". With diameters on the order of I to 5 micro -meters, these bubbles are so small that extremely tiny chips can hold...methods of transfer: polled I/O, interrupt driven I/O, and direct memory access (DMA). The first two methods require tho host processor be involved
Implementing direct, spatially isolated problems on transputer networks
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1988-01-01
Parametric studies were performed on transputer networks of up to 40 processors to determine how to implement and maximize the performance of the solution of problems where no processor-to-processor data transfer is required for the problem solution (spatially isolated). Two types of problems are investigated a computationally intensive problem where the solution required the transmission of 160 bytes of data through the parallel network, and a communication intensive example that required the transmission of 3 Mbytes of data through the network. This data consists of solutions being sent back to the host processor and not intermediate results for another processor to work on. Studies were performed on both integer and floating-point transputers. The latter features an on-chip floating-point math unit and offers approximately an order of magnitude performance increase over the integer transputer on real valued computations. The results indicate that a minimum amount of work is required on each node per communication to achieve high network speedups (efficiencies). The floating-point processor requires approximately an order of magnitude more work per communication than the integer processor because of the floating-point unit's increased computing capacity.
Radiation effects and mitigation strategies for modern FPGAs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stettler, M. W.; Caffrey, M. P.; Graham, P. S.
2004-01-01
Field Programmable Gate Array devices have become the technology of choice in small volume modern instrumentation and control systems. These devices have always offered significant advantages in flexibility, and recent advances in fabrication have greatly increased logic capacity, substantially increasing the number of applications for this technology. Unfortunately, the increased density (and corresponding shrinkage of process geometry), has made these devices more susceptible to failure due to external radiation. This has been an issue for space based systems for some time, but is now becoming an issue for terrestrial systems in elevated radiation environments and commercial avionics as well. Characterizingmore » the failure modes of Xilinx FPGAs, and developing mitigation strategies is the subject of ongoing research by a consortium of academic, industrial, and governmental laboratories. This paper presents background information of radiation effects and failure modes, as well as current and future mitigation techniques. In particular, the availability of very large FPGA devices, complete with generous amounts of RAM and embedded processor(s), has led to the implementation of complete digital systems on a single device, bringing issues of system reliability and redundancy management to the chip level. Radiation effects on a single FPGA are increasingly likely to have system level consequences, and will need to be addressed in current and future designs.« less
Readout and trigger for the AFP detector at ATLAS experiment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kocian, M.
AFP, the ATLAS Forward Proton consists of silicon detectors at 205 m and 217 m on each side of ATLAS. In 2016 two detectors in one side were installed. The FEI4 chips are read at 160 Mbps over the optical fibers. The DAQ system uses a FPGA board with Artix chip and a mezzanine card with RCE data processing module based on a Zynq chip with ARM processor running ArchLinux. Finally, in this paper we give an overview of the AFP detector with the commissioning steps taken to integrate with the ATLAS TDAQ. Furthermore first performance results are presented.
Readout and trigger for the AFP detector at ATLAS experiment
Kocian, M.
2017-01-25
AFP, the ATLAS Forward Proton consists of silicon detectors at 205 m and 217 m on each side of ATLAS. In 2016 two detectors in one side were installed. The FEI4 chips are read at 160 Mbps over the optical fibers. The DAQ system uses a FPGA board with Artix chip and a mezzanine card with RCE data processing module based on a Zynq chip with ARM processor running ArchLinux. Finally, in this paper we give an overview of the AFP detector with the commissioning steps taken to integrate with the ATLAS TDAQ. Furthermore first performance results are presented.
A Cost Effective System Design Approach for Critical Space Systems
NASA Technical Reports Server (NTRS)
Abbott, Larry Wayne; Cox, Gary; Nguyen, Hai
2000-01-01
NASA-JSC required an avionics platform capable of serving a wide range of applications in a cost-effective manner. In part, making the avionics platform cost effective means adhering to open standards and supporting the integration of COTS products with custom products. Inherently, operation in space requires low power, mass, and volume while retaining high performance, reconfigurability, scalability, and upgradability. The Universal Mini-Controller project is based on a modified PC/104-Plus architecture while maintaining full compatibility with standard COTS PC/104 products. The architecture consists of a library of building block modules, which can be mixed and matched to meet a specific application. A set of NASA developed core building blocks, processor card, analog input/output card, and a Mil-Std-1553 card, have been constructed to meet critical functions and unique interfaces. The design for the processor card is based on the PowerPC architecture. This architecture provides an excellent balance between power consumption and performance, and has an upgrade path to the forthcoming radiation hardened PowerPC processor. The processor card, which makes extensive use of surface mount technology, has a 166 MHz PowerPC 603e processor, 32 Mbytes of error detected and corrected RAM, 8 Mbytes of Flash, and I Mbytes of EPROM, on a single PC/104-Plus card. Similar densities have been achieved with the quad channel Mil-Std-1553 card and the analog input/output cards. The power management built into the processor and its peripheral chip allows the power and performance of the system to be adjusted to meet the requirements of the application, allowing another dimension to the flexibility of the Universal Mini-Controller. Unique mechanical packaging allows the Universal Mini-Controller to accommodate standard COTS and custom oversized PC/104-Plus cards. This mechanical packaging also provides thermal management via conductive cooling of COTS boards, which are typically designed for convection cooling methods.
Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo
2018-02-01
Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
Solving Coupled Gross--Pitaevskii Equations on a Cluster of PlayStation 3 Computers
NASA Astrophysics Data System (ADS)
Edwards, Mark; Heward, Jeffrey; Clark, C. W.
2009-05-01
At Georgia Southern University we have constructed an 8+1--node cluster of Sony PlayStation 3 (PS3) computers with the intention of using this computing resource to solve problems related to the behavior of ultra--cold atoms in general with a particular emphasis on studying bose--bose and bose--fermi mixtures confined in optical lattices. As a first project that uses this computing resource, we have implemented a parallel solver of the coupled time--dependent, one--dimensional Gross--Pitaevskii (TDGP) equations. These equations govern the behavior of dual-- species bosonic mixtures. We chose the split--operator/FFT to solve the coupled 1D TDGP equations. The fast Fourier transform component of this solver can be readily parallelized on the PS3 cpu known as the Cell Broadband Engine (CellBE). Each CellBE chip contains a single 64--bit PowerPC Processor Element known as the PPE and eight ``Synergistic Processor Element'' identified as the SPE's. We report on this algorithm and compare its performance to a non--parallel solver as applied to modeling evaporative cooling in dual--species bosonic mixtures.
NASA Technical Reports Server (NTRS)
Liu, Yuan-Kwei
1991-01-01
The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/386 DX memory hierachy appears to be the most beneficial change to the current DMS design at this time.
NASA Technical Reports Server (NTRS)
Liu, Yuan-Kwei
1991-01-01
The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/387 DX memory hierarchy appears to be the most beneficial change to the current DMS design at this time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wickstrom, Gregory Lloyd; Gale, Jason Carl; Ma, Kwok Kee
The Sandia Secure Processor (SSP) is a new native Java processor that has been specifically designed for embedded applications. The SSP's design is a system composed of a core Java processor that directly executes Java bytecodes, on-chip intelligent IO modules, and a suite of software tools for simulation and compiling executable binary files. The SSP is unique in that it provides a way to control real-time IO modules for embedded applications. The system software for the SSP is a 'class loader' that takes Java .class files (created with your favorite Java compiler), links them together, and compiles a binary. Themore » complete SSP system provides very powerful functionality with very light hardware requirements with the potential to be used in a wide variety of small-system embedded applications. This paper gives a detail description of the Sandia Secure Processor and its unique features.« less
OSCAR: A Compact, Powerful and Versatile On Board Computer Based on LEON3 Core
NASA Astrophysics Data System (ADS)
Poupat, Jean-Luc; Lefevre, Aurelien; Koebel, Franck
2011-08-01
Satellites are controlled via a platform On Board Computer (OBC) that manages different parameters (attitude, orbit, modes, temperatures ...) with respect to its payload mission (telecommunication, earth observation, scientific mission). The platform OBC is connected to the satellite and the ground control via digital links, and executes on board software.The main functions of a platform OBC are to provide the satellite flight segment with the following features: o Processing resources for the flight mission software o TM/TC services and interfaces with the RF communication chaino General communication services with the Avionicsand payload equipments through an on-board communication bus based on the MIL-1553B standard or CANo Time synchronization and distributiono Failure tolerant architecture based on the use of redounded reconfiguration units and redundancyimplementationFrom a hardware point of view, it groups a lot of digital functions usually dispatched on numerous chips (processor, co-processor, digital links IP ...) together. In order to reach an ultimate level of integration, Astrium has designed an ASIC gathering on a single chip all the required digital functions: the SCOC3 ASIC.Astrium has developed an OBC based on this SCOC3 ASIC: the OSCAR (Optimized Spacecraft Computer Architecture with Reconfiguration). It is now available off-the-shelf as the new OBC product family of Astrium.This paper presents the major innovations introduced by Astrium for SCOC3 and OSCAR with the objective to save cost and mass through a solution compatible with any class quality project, using a unique software development environment for user.
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; Jong, Wibe de
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shan, Hongzhang; Williams, Samuel; de Jong, Wibe
In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments.more » In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.« less
QI2S - Quick Image Interpretation System
NASA Astrophysics Data System (ADS)
Naghmouchi, Jamin; Aviely, Peleg; Ginosar, Ran; Ober, Giovanna; Bischoff, Ole; Nadler, Ron; Guiser, David; Citroen, Meira; Freddi, Riccardo; Berekovic, Mladen
2015-09-01
The evolution of the Earth Observation mission will be driven by many factors, and the deveploment of new processing paradigms to facilitate data downlink, handling and storage will be a key factor. Next generation EO satellites will generate a great amount of data at a very high data rate, both radar and optical. Real-time onboard processing can be the solution to reduce data downlink and management on ground. Radiometric, geometric, and atmospheric corrections of EO data as well as material/object detection in addition to the well-known needs for image compression and signal processing can be performed directly on board and the aim of QI2S project is to demonstrate this. QI2S, a concept prototype system for novel onboard image processing and image interpretation which has been designed, developed and validated in the framework of an EU FP7 project, targets these needs and makes a significant step towards exceeding current roadmaps of leading space agencies for future payload processors. The QI2S system features multiple chip components of the RC64, a novel rad-hard 64-core signal processing chip, which targets DSP performance of 75 GMACs (16bit), 150 GOPS and 38 single precision GFLOPS while dissipating less than 10 Watts. It integrates advanced DSP cores with a multibank shared memory and a hardware scheduler, also supporting DDR2/3 memory and twelve 3.125 Gbps full duplex high-speed serial links using SpaceFibre and other protocols. The processor is being developed within the European FP7 Framework Program and will be qualified to the highest space standards.
Towards energy-efficient photonic interconnects
NASA Astrophysics Data System (ADS)
Demir, Yigit; Hardavellas, Nikos
2015-03-01
Silicon photonics have emerged as a promising solution to meet the growing demand for high-bandwidth, low-latency, and energy-efficient on-chip and off-chip communication in many-core processors. However, current silicon-photonic interconnect designs for many-core processors waste a significant amount of power because (a) lasers are always on, even during periods of interconnect inactivity, and (b) microring resonators employ heaters which consume a significant amount of power just to overcome thermal variations and maintain communication on the photonic links, especially in a 3D-stacked design. The problem of high laser power consumption is particularly important as lasers typically have very low energy efficiency, and photonic interconnects often remain underutilized both in scientific computing (compute-intensive execution phases underutilize the interconnect), and in server computing (servers in Google-scale datacenters have a typical utilization of less than 30%). We address the high laser power consumption by proposing EcoLaser+, which is a laser control scheme that saves energy by predicting the interconnect activity and opportunistically turning the on-chip laser off when possible, and also by scaling the width of the communication link based on a runtime prediction of the expected message length. Our laser control scheme can save up to 62 - 92% of the laser energy, and improve the energy efficiency of a manycore processor with negligible performance penalty. We address the high trimming (heating) power consumption of the microrings by proposing insulation methods that reduce the impact of localized heating induced by highly-active components on the 3D-stacked logic die.
Fault-tolerant corrector/detector chip for high-speed data processing
Andaleon, David D.; Napolitano, Jr., Leonard M.; Redinbo, G. Robert; Shreeve, William O.
1994-01-01
An internally fault-tolerant data error detection and correction integrated circuit device (10) and a method of operating same. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum is provided with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented.
Fault-tolerant corrector/detector chip for high-speed data processing
Andaleon, D.D.; Napolitano, L.M. Jr.; Redinbo, G.R.; Shreeve, W.O.
1994-03-01
An internally fault-tolerant data error detection and correction integrated circuit device and a method of operating same is described. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented. 8 figures.
Fast Fourier Transform Co-Processor (FFTC)- Towards Embedded GFLOPs
NASA Astrophysics Data System (ADS)
Kuehl, Christopher; Liebstueckel, Uwe; Tejerina, Isaac; Uemminghaus, Michael; Wite, Felix; Kolb, Michael; Suess, Martin; Weigand, Roland
2012-08-01
Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co- Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment.In frame of the ESA activity “Fast Fourier Transform DSP Co-processor (FFTC)” (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following:Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP.The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance.The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT- based processing tasks.A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses.The presentation will give and overview on the project, including the results of the validation of the FFTC ASIC prototypes.
Fast Fourier Transform Co-processor (FFTC), towards embedded GFLOPs
NASA Astrophysics Data System (ADS)
Kuehl, Christopher; Liebstueckel, Uwe; Tejerina, Isaac; Uemminghaus, Michael; Witte, Felix; Kolb, Michael; Suess, Martin; Weigand, Roland; Kopp, Nicholas
2012-10-01
Many signal processing applications and algorithms perform their operations on the data in the transform domain to gain efficiency. The Fourier Transform Co-Processor has been developed with the aim to offload General Purpose Processors from performing these transformations and therefore to boast the overall performance of a processing module. The IP of the commercial PowerFFT processor has been selected and adapted to meet the constraints of the space environment. In frame of the ESA activity "Fast Fourier Transform DSP Co-processor (FFTC)" (ESTEC/Contract No. 15314/07/NL/LvH/ma) the objectives were the following: • Production of prototypes of a space qualified version of the commercial PowerFFT chip called FFTC based on the PowerFFT IP. • The development of a stand-alone FFTC Accelerator Board (FTAB) based on the FFTC including the Controller FPGA and SpaceWire Interfaces to verify the FFTC function and performance. The FFTC chip performs its calculations with floating point precision. Stand alone it is capable computing FFTs of up to 1K complex samples in length in only 10μsec. This corresponds to an equivalent processing performance of 4.7 GFlops. In this mode the maximum sustained data throughput reaches 6.4Gbit/s. When connected to up to 4 EDAC protected SDRAM memory banks the FFTC can perform long FFTs with up to 1M complex samples in length or multidimensional FFT-based processing tasks. A Controller FPGA on the FTAB takes care of the SDRAM addressing. The instructions commanded via the Controller FPGA are used to set up the data flow and generate the memory addresses. The paper will give an overview on the project, including the results of the validation of the FFTC ASIC prototypes.
Stanford Hardware Development Program
NASA Technical Reports Server (NTRS)
Peterson, A.; Linscott, I.; Burr, J.
1986-01-01
Architectures for high performance, digital signal processing, particularly for high resolution, wide band spectrum analysis were developed. These developments are intended to provide instrumentation for NASA's Search for Extraterrestrial Intelligence (SETI) program. The real time signal processing is both formal and experimental. The efficient organization and optimal scheduling of signal processing algorithms were investigated. The work is complemented by efforts in processor architecture design and implementation. A high resolution, multichannel spectrometer that incorporates special purpose microcoded signal processors is being tested. A general purpose signal processor for the data from the multichannel spectrometer was designed to function as the processing element in a highly concurrent machine. The processor performance required for the spectrometer is in the range of 1000 to 10,000 million instructions per second (MIPS). Multiple node processor configurations, where each node performs at 100 MIPS, are sought. The nodes are microprogrammable and are interconnected through a network with high bandwidth for neighboring nodes, and medium bandwidth for nodes at larger distance. The implementation of both the current mutlichannel spectrometer and the signal processor as Very Large Scale Integration CMOS chip sets was commenced.
Design and realization of the baseband processor in satellite navigation and positioning receiver
NASA Astrophysics Data System (ADS)
Zhang, Dawei; Hu, Xiulin; Li, Chen
2007-11-01
The content of this paper is focused on the Design and realization of the baseband processor in satellite navigation and positioning receiver. Baseband processor is the most important part of the satellite positioning receiver. The design covers baseband processor's main functions include multi-channel digital signal DDC, acquisition, code tracking, carrier tracking, demodulation, etc. The realization is based on an Altera's FPGA device, that makes the system can be improved and upgraded without modifying the hardware. It embodies the theory of software defined radio (SDR), and puts the theory of the spread spectrum into practice. This paper puts emphasis on the realization of baseband processor in FPGA. In the order of choosing chips, design entry, debugging and synthesis, the flow is presented detailedly. Additionally the paper detailed realization of Digital PLL in order to explain a method of reducing the consumption of FPGA. Finally, the paper presents the result of Synthesis. This design has been used in BD-1, BD-2 and GPS.
Self-checking self-repairing computer nodes using the mirror processor
NASA Technical Reports Server (NTRS)
Tamir, Yuval
1992-01-01
Circuitry added to fault-tolerant systems for concurrent error deduction usually reduces performance. Using a technique called micro rollback, it is possible to eliminate most of the performance penalty of concurrent error detection. Error detection is performed in parallel with intermodule communication, and erroneous state changes are later undone. The author reports on the design and implementation of a VLSI RISC microprocessor, called the Mirror Processor (MP), which is capable of micro rollback. In order to achieve concurrent error detection, two MP chips operate in lockstep, comparing external signals and a signature of internal signals every clock cycle. If a mismatch is detected, both processors roll back to the beginning of the cycle when the error occurred. In some cases the erroneous state is corrected by copying a value from the fault-free processor to the faulty processor. The architecture, microarchitecture, and VLSI implementation of the MP, emphasizing its error-detection, error-recovery, and self-diagnosis capabilities, are described.
A single FPGA-based portable ultrasound imaging system for point-of-care applications.
Kim, Gi-Duck; Yoon, Changhan; Kye, Sang-Bum; Lee, Youngbae; Kang, Jeeun; Yoo, Yangmo; Song, Tai-kyong
2012-07-01
We present a cost-effective portable ultrasound system based on a single field-programmable gate array (FPGA) for point-of-care applications. In the portable ultrasound system developed, all the ultrasound signal and image processing modules, including an effective 32-channel receive beamformer with pseudo-dynamic focusing, are embedded in an FPGA chip. For overall system control, a mobile processor running Linux at 667 MHz is used. The scan-converted ultrasound image data from the FPGA are directly transferred to the system controller via external direct memory access without a video processing unit. The potable ultrasound system developed can provide real-time B-mode imaging with a maximum frame rate of 30, and it has a battery life of approximately 1.5 h. These results indicate that the single FPGA-based portable ultrasound system developed is able to meet the processing requirements in medical ultrasound imaging while providing improved flexibility for adapting to emerging POC applications.
Realtime multiprocessor for mobile ad hoc networks
NASA Astrophysics Data System (ADS)
Jungeblut, T.; Grünewald, M.; Porrmann, M.; Rückert, U.
2008-05-01
This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC) for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC). The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.
Functional Specification and Simulation of a Floating Point Co-Processor for SPUR
1986-08-01
depend on this state will not be stable until the next phase; this leaves the problem of how to control events that must occur on phi 1 of a cycle. The... problems with the structure of the chip description. The worst of these problems is the absence of Slang constructs for coding separate chip component...constructs such as UNK as well. Another related problem was the inability to explicitly declare the size of Slang node values. \\Vhile the correct
Chip-integrated optical power limiter based on an all-passive micro-ring resonator
NASA Astrophysics Data System (ADS)
Yan, Siqi; Dong, Jianji; Zheng, Aoling; Zhang, Xinliang
2014-10-01
Recent progress in silicon nanophotonics has dramatically advanced the possible realization of large-scale on-chip optical interconnects integration. Adopting photons as information carriers can break the performance bottleneck of electronic integrated circuit such as serious thermal losses and poor process rates. However, in integrated photonics circuits, few reported work can impose an upper limit of optical power therefore prevent the optical device from harm caused by high power. In this study, we experimentally demonstrate a feasible integrated scheme based on a single all-passive micro-ring resonator to realize the optical power limitation which has a similar function of current limiting circuit in electronics. Besides, we analyze the performance of optical power limiter at various signal bit rates. The results show that the proposed device can limit the signal power effectively at a bit rate up to 20 Gbit/s without deteriorating the signal. Meanwhile, this ultra-compact silicon device can be completely compatible with the electronic technology (typically complementary metal-oxide semiconductor technology), which may pave the way of very large scale integrated photonic circuits for all-optical information processors and artificial intelligence systems.
The MasPar MP-1 As a Computer Arithmetic Laboratory
Anuta, Michael A.; Lozier, Daniel W.; Turner, Peter R.
1996-01-01
This paper is a blueprint for the use of a massively parallel SIMD computer architecture for the simulation of various forms of computer arithmetic. The particular system used is a DEC/MasPar MP-1 with 4096 processors in a square array. This architecture has many advantages for such simulations due largely to the simplicity of the individual processors. Arithmetic operations can be spread across the processor array to simulate a hardware chip. Alternatively they may be performed on individual processors to allow simulation of a massively parallel implementation of the arithmetic. Compromises between these extremes permit speed-area tradeoffs to be examined. The paper includes a description of the architecture and its features. It then summarizes some of the arithmetic systems which have been, or are to be, implemented. The implementation of the level-index and symmetric level-index, LI and SLI, systems is described in some detail. An extensive bibliography is included. PMID:27805123
FPGA based control system for space instrumentation
NASA Astrophysics Data System (ADS)
Di Giorgio, Anna M.; Cerulli Irelli, Pasquale; Nuzzolo, Francesco; Orfei, Renato; Spinoglio, Luigi; Liu, Giovanni S.; Saraceno, Paolo
2008-07-01
The prototype for a general purpose FPGA based control system for space instrumentation is presented, with particular attention to the instrument control application software. The system HW is based on the LEON3FT processor, which gives the flexibility to configure the chip with only the necessary HW functionalities, from simple logic up to small dedicated processors. The instrument control SW is developed in ANSI C and for time critical (<10μs) commanding sequences implements an internal instructions sequencer, triggered via an interrupt service routine based on a HW high priority interrupt.
Design and analysis of microcontroller system using AMBA-Lite bus
NASA Astrophysics Data System (ADS)
Suan, Wang Hang; Bahari Jambek, Asral
2017-11-01
Advanced Microcontroller Bus Architecture (AMBA) is one of the well-designed on chip communication system. It is designed for right first-time development with many processor and peripherals. In this paper, the different family of AMBA architecture such as AXI, APB, AHB are reviewed. In this work, the AMBA-Lite is used and implemented with a few peripherals and an ARM processor. The work is simulated using Synopsys and demonstrated on the Digilent Nexys4 DDR board and the software use to synthesis the design is Vivado 2016.2.
Low-power, transparent optical network interface for high bandwidth off-chip interconnects.
Liboiron-Ladouceur, Odile; Wang, Howard; Garg, Ajay S; Bergman, Keren
2009-04-13
The recent emergence of multicore architectures and chip multiprocessors (CMPs) has accelerated the bandwidth requirements in high-performance processors for both on-chip and off-chip interconnects. For next generation computing clusters, the delivery of scalable power efficient off-chip communications to each compute node has emerged as a key bottleneck to realizing the full computational performance of these systems. The power dissipation is dominated by the off-chip interface and the necessity to drive high-speed signals over long distances. We present a scalable photonic network interface approach that fully exploits the bandwidth capacity offered by optical interconnects while offering significant power savings over traditional E/O and O/E approaches. The power-efficient interface optically aggregates electronic serial data streams into a multiple WDM channel packet structure at time-of-flight latencies. We demonstrate a scalable optical network interface with 70% improvement in power efficiency for a complete end-to-end PCI Express data transfer.
Radiation hardened microprocessor for small payloads
NASA Technical Reports Server (NTRS)
Shah, Ravi
1993-01-01
The RH-3000 program is developing a rad-hard space qualified 32-bit MIPS R-3000 RISC processor under the Naval Research Lab sponsorship. In addition, under IR&D Harris is developing RHC-3000 for embedded control applications where low cost and radiation tolerance are primary concerns. The development program leverages heavily from commercial development of the MIPS R-3000. The commercial R-3000 has a large installed user base and several foundry partners are currently producing a wide variety of R-3000 derivative products. One of the MIPS derivative products, the LR33000 from LSI Logic, was used as the basis for the design of the RH-3000 chipset. The RH-3000 chipset consists of three core chips and two support chips. The core chips include the CPU, which is the R-3000 integer unit and the FPA/MD chip pair, which performs the R-3010 floating point functions. The two support whips contain all the support functions required for fault tolerance support, real-time support, memory management, timers, and other functions. The Harris development effort had first passed silicon success in June, 1992 with the first rad-hard 32-bit RH-3000 CPU chip. The CPU device is 30 kgates, has a 508 mil by 503 mil die size and is fabricated at Harris Semiconductor on the rad-hard CMOS Silicon on Sapphire (SOS) process. The CPU device successfully passed tesing against 600,000 test vectors derived directly on the LSI/MIPS test suite and has been operational as a single board computer running C code for the past year. In addition, the RH-3000 program has developed the methodology for converting commercially developed designs utilizing logic synthesis techniques based on a combination of VHDK and schematic data bases.
NASA Astrophysics Data System (ADS)
Flynn, Edward M.; Mackowski, Michael J.
1993-01-01
This interim report documents the results of the first two phases of a four-phase program to develop a high flux heat exchanger for cooling future high performance aircraft electronics. Phase 1 defines future needs for high flux heat removal in advanced military electronics systems. The results are sorted by broad application categories: (1) commercial digital systems, (2) military data processors, (3) power processors, and (4) radar and optical systems. For applications expected to be fielded in five to ten years, the outlook is for steady state flux levels of 30-50 W/sq cm for digital processors and several hundred W/sq cm for power control applications. In Phase 1, a trade study was conducted on emerging cooling technologies which could remove a steady state chip heat flux of 100 W/sq cm while holding chip junction temperature to 90 C. Constraints imposed on heat exchanger design, in order to reflect operation in a fighter aircraft environment, included a practical lower limit on coolant supply temperature, the preference for a nontoxic, nonflammable, and nonfreezing coolant, the need to minimize weight and volume, and operation in an accelerating environment. The trade study recommended the Compact High Intensity Cooler (CHIC) for design, fabrication, and test in the final two phases of this program.
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches
NASA Astrophysics Data System (ADS)
Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur
2018-03-01
Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
ISFET-based sensor signal processor chip design for environment monitoring applications
NASA Astrophysics Data System (ADS)
Chung, Wen-Yaw; Yang, Chung-Huang; Wang, Ming-Ga
2004-12-01
In recent years Ion-Sensitive Field Effect Transistor (ISFET) based transducers create valuable applications in physiological data acquisition and environment monitoring. This paper presents a mixed-mode ASIC design for potentiometric ISFET-based bio-chemical sensor applications including H+ sensing and hand-held pH meter. For battery power consideration, the proposed system consists of low voltage (3V) analog front-end readout circuits and digital processor has been developed and fabricated in a 0.5mm double-poly double-metal CMOS technology. To assure that the correct pH value can be measured, the two-point calibration circuitry based on the response of standard pH4 and pH7 buffer solution has been implemented by using algorithmic state machine hardware algorithms. The measurement accuracy of the chip is 10 bits and the measured range between pH 2 to pH 12 compared to ideal values is within the accuracy of 0.1pH. For homeland environmental applications, the system provide rapid, easy to use, and cost-effective on-site testing on the quality of water, such as drinking water, ground water and river water. The processor has a potential usage in battery-operated and portable devices in environmental monitoring applications compared to commercial hand-held pH meter.
ERIC Educational Resources Information Center
Perez, Ernest
1997-01-01
Examines the practical realities of upgrading Intel personal computers in libraries, considering budgets and technical personnel availability. Highlights include adding RAM; putting in faster processor chips, including clock multipliers; new hard disks; CD-ROM speed; motherboards and interface cards; cost limits and economic factors; and…
A Spacecraft Housekeeping System-on-Chip in a Radiation Hardened Structured ASIC
NASA Technical Reports Server (NTRS)
Suarez, George; DuMonthier, Jeffrey J.; Sheikh, Salman S.; Powell, Wesley A.; King, Robyn L.
2012-01-01
Housekeeping systems are essential to health monitoring of spacecraft and instruments. Typically, sensors are distributed across various sub-systems and data is collected using components such as analog-to-digital converters, analog multiplexers and amplifiers. In most cases programmable devices are used to implement the data acquisition control and storage, and the interface to higher level systems. Such discrete implementations require additional size, weight, power and interconnect complexity versus an integrated circuit solution, as well as the qualification of multiple parts. Although commercial devices are readily available, they are not suitable for space applications due the radiation tolerance and qualification requirements. The Housekeeping System-o n-A-Chip (HKSOC) is a low power, radiation hardened integrated solution suitable for spacecraft and instrument control and data collection. A prototype has been designed and includes a wide variety of functions including a 16-channel analog front-end for driving and reading sensors, analog-to-digital and digital-to-analog converters, on-chip temperature sensor, power supply current sense circuits, general purpose comparators and amplifiers, a 32-bit processor, digital I/O, pulse-width modulation (PWM) generators, timers and I2C master and slave serial interfaces. In addition, the device can operate in a bypass mode where the processor is disabled and external logic is used to control the analog and mixed signal functions. The device is suitable for stand-alone or distributed systems where multiple chips can be deployed across different sub-systems as intelligent nodes with computing and processing capabilities.
NASA Astrophysics Data System (ADS)
Xie, Yiwei; Zhuang, Leimeng; Boller, Klaus-Jochen; Lowery, Arthur James
2017-06-01
Optical delay lines implemented in photonic integrated circuits (PICs) are essential for creating robust and low-cost optical signal processors on miniaturized chips. In particular, tunable delay lines enable a key feature of programmability for the on-chip processing functions. However, the previously investigated tunable delay lines are plagued by a severe drawback of delay-dependent loss due to the propagation loss in the constituent waveguides. In principle, a serial-connected amplifier can be used to compensate such losses or perform additional amplitude manipulation. However, this solution is generally unpractical as it introduces additional burden on chip area and power consumption, particularly for large-scale integrated PICs. Here, we report an integrated tunable delay line that overcomes the delay-dependent loss, and simultaneously allows for independent manipulation of group delay and amplitude responses. It uses a ring resonator with a tunable coupler and a semiconductor optical amplifier in the feedback path. A proof-of-concept device with a free spectral range of 11.5 GHz and a delay bandwidth in the order of 200 MHz is discussed in the context of microwave photonics and is experimentally demonstrated to be able to provide a lossless delay up to 1.1 to a 5 ns Gaussian pulse. The proposed device can be designed for different frequency scales with potential for applications across many other areas such as telecommunications, LIDAR, and spectroscopy, serving as a novel building block for creating chip-scale programmable optical signal processors.
Multi-Core Processor Memory Contention Benchmark Analysis Case Study
NASA Technical Reports Server (NTRS)
Simon, Tyler; McGalliard, James
2009-01-01
Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Property-driven functional verification technique for high-speed vision system-on-chip processor
NASA Astrophysics Data System (ADS)
Nshunguyimfura, Victor; Yang, Jie; Liu, Liyuan; Wu, Nanjian
2017-04-01
The implementation of functional verification in a fast, reliable, and effective manner is a challenging task in a vision chip verification process. The main reason for this challenge is the stepwise nature of existing functional verification techniques. This vision chip verification complexity is also related to the fact that in most vision chip design cycles, extensive efforts are focused on how to optimize chip metrics such as performance, power, and area. Design functional verification is not explicitly considered at an earlier stage at which the most sound decisions are made. In this paper, we propose a semi-automatic property-driven verification technique. The implementation of all verification components is based on design properties. We introduce a low-dimension property space between the specification space and the implementation space. The aim of this technique is to speed up the verification process for high-performance parallel processing vision chips. Our experimentation results show that the proposed technique can effectively improve the verification effort up to 20% for the complex vision chip design while reducing the simulation and debugging overheads.
Reconfigurable signal processor designs for advanced digital array radar systems
NASA Astrophysics Data System (ADS)
Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining
2017-05-01
The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
A Low-Power High-Speed Smart Sensor Design for Space Exploration Missions
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi
1997-01-01
A low-power high-speed smart sensor system based on a large format active pixel sensor (APS) integrated with a programmable neural processor for space exploration missions is presented. The concept of building an advanced smart sensing system is demonstrated by a system-level microchip design that is composed with an APS sensor, a programmable neural processor, and an embedded microprocessor in a SOI CMOS technology. This ultra-fast smart sensor system-on-a-chip design mimics what is inherent in biological vision systems. Moreover, it is programmable and capable of performing ultra-fast machine vision processing in all levels such as image acquisition, image fusion, image analysis, scene interpretation, and control functions. The system provides about one tera-operation-per-second computing power which is a two order-of-magnitude increase over that of state-of-the-art microcomputers. Its high performance is due to massively parallel computing structures, high data throughput rates, fast learning capabilities, and advanced VLSI system-on-a-chip implementation.
Current, K. Wayne; Yuk, Kelvin; McConaghy, Charles; Gascoyne, Peter R. C.; Schwartz, Jon A.; Vykoukal, Jody V.; Andrews, Craig
2010-01-01
A high-voltage (HV) integrated circuit has been demonstrated to transport droplets on programmable paths across its coated surface. This chip is the engine for a dielectrophoresis (DEP)-based micro-fluidic lab-on-a-chip system. This chip creates DEP forces that move and help inject droplets. Electrode excitation voltage and frequency are variable. With the electrodes driven with a 100V peak-to-peak periodic waveform, the maximum high-voltage electrode waveform frequency is about 200Hz. Data communication rate is variable up to 250kHz. This demonstration chip has a 32×32 array of nominally 100V electrode drivers. It is fabricated in a 130V SOI CMOS fabrication technology, dissipates a maximum of 1.87W, and is about 10.4 mm × 8.2 mm. PMID:23989241
Research on numerical control system based on S3C2410 and MCX314AL
NASA Astrophysics Data System (ADS)
Ren, Qiang; Jiang, Tingbiao
2008-10-01
With the rapid development of micro-computer technology, embedded system, CNC technology and integrated circuits, numerical control system with powerful functions can be realized by several high-speed CPU chips and RISC (Reduced Instruction Set Computing) chips which have small size and strong stability. In addition, the real-time operating system also makes the attainment of embedded system possible. Developing the NC system based on embedded technology can overcome some shortcomings of common PC-based CNC system, such as the waste of resources, low control precision, low frequency and low integration. This paper discusses a hardware platform of ENC (Embedded Numerical Control) system based on embedded processor chip ARM (Advanced RISC Machines)-S3C2410 and DSP (Digital Signal Processor)-MCX314AL and introduces the process of developing ENC system software. Finally write the MCX314AL's driver under the embedded Linux operating system. The embedded Linux operating system can deal with multitask well moreover satisfy the real-time and reliability of movement control. NC system has the advantages of best using resources and compact system with embedded technology. It provides a wealth of functions and superior performance with a lower cost. It can be sure that ENC is the direction of the future development.
Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kerbyson, Darren J; Lang, Michael; Pakin, Scott
2009-01-01
Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance. Processor-cores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contain wave-front processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundary data downstream and whose cost ismore » typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional computation and higher use of on-chip communications. This tradeoff is explored using a performance model and an implementation on the Petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in system communication performance exists.« less
Energy challenges in optical access and aggregation networks.
Kilper, Daniel C; Rastegarfar, Houman
2016-03-06
Scalability is a critical issue for access and aggregation networks as they must support the growth in both the size of data capacity demands and the multiplicity of access points. The number of connected devices, the Internet of Things, is growing to the tens of billions. Prevailing communication paradigms are reaching physical limitations that make continued growth problematic. Challenges are emerging in electronic and optical systems and energy increasingly plays a central role. With the spectral efficiency of optical systems approaching the Shannon limit, increasing parallelism is required to support higher capacities. For electronic systems, as the density and speed increases, the total system energy, thermal density and energy per bit are moving into regimes that become impractical to support-for example requiring single-chip processor powers above the 100 W limit common today. We examine communication network scaling and energy use from the Internet core down to the computer processor core and consider implications for optical networks. Optical switching in data centres is identified as a potential model from which scalable access and aggregation networks for the future Internet, with the application of integrated photonic devices and intelligent hybrid networking, will emerge. © 2016 The Author(s).
Stereo and IMU-Assisted Visual Odometry for Small Robots
NASA Technical Reports Server (NTRS)
2012-01-01
This software performs two functions: (1) taking stereo image pairs as input, it computes stereo disparity maps from them by cross-correlation to achieve 3D (three-dimensional) perception; (2) taking a sequence of stereo image pairs as input, it tracks features in the image sequence to estimate the motion of the cameras between successive image pairs. A real-time stereo vision system with IMU (inertial measurement unit)-assisted visual odometry was implemented on a single 750 MHz/520 MHz OMAP3530 SoC (system on chip) from TI (Texas Instruments). Frame rates of 46 fps (frames per second) were achieved at QVGA (Quarter Video Graphics Array i.e. 320 240), or 8 fps at VGA (Video Graphics Array 640 480) resolutions, while simultaneously tracking up to 200 features, taking full advantage of the OMAP3530's integer DSP (digital signal processor) and floating point ARM processors. This is a substantial advancement over previous work as the stereo implementation produces 146 Mde/s (millions of disparities evaluated per second) in 2.5W, yielding a stereo energy efficiency of 58.8 Mde/J, which is 3.75 better than prior DSP stereo while providing more functionality.
Suppression of the vacuolar invertase gene delays senescent sweetening in chipping potatoes.
Wiberley-Bradford, Amy E; Bethke, Paul C
2018-01-01
Potato chip processors require potato tubers that meet quality specifications for fried chip color, and color depends largely upon tuber sugar contents. At later times in storage, potatoes accumulate sucrose, glucose, and fructose. This developmental process, senescent sweetening, manifests as a blush of color near the center of the fried chip, becomes more severe with time, and limits the storage period. Vacuolar invertase (VInv) converts sucrose to glucose and fructose and is hypothesized to play a role in senescent sweetening. To test this hypothesis, senescent sweetening was quantified in multiple lines of potato with reduced VInv expression. Chip darkening from senescent sweetening was delayed by about 4 weeks for tubers with reduced VInv expression. A strong positive correlation between frequency of dark chips and tuber hexose content was observed. Tubers with reduced VInv expression had lower hexose to sucrose ratios than controls. VInv activity contributes to reducing sugar accumulation during senescent sweetening. Sucrose breakdown during frying may contribute to chip darkening. Suppressing VInv expression increases the storage period of the chipping potato crop, which is an important consideration, as potatoes with reduced VInv expression are entering commercial production in the USA. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
A Wearable Healthcare System With a 13.7 μA Noise Tolerant ECG Processor.
Izumi, Shintaro; Yamashita, Ken; Nakano, Masanao; Kawaguchi, Hiroshi; Kimura, Hiromitsu; Marumoto, Kyoji; Fuchikami, Takaaki; Fujimori, Yoshikazu; Nakajima, Hiroshi; Shiga, Toshikazu; Yoshimoto, Masahiko
2015-10-01
To prevent lifestyle diseases, wearable bio-signal monitoring systems for daily life monitoring have attracted attention. Wearable systems have strict size and weight constraints, which impose significant limitations of the battery capacity and the signal-to-noise ratio of bio-signals. This report describes an electrocardiograph (ECG) processor for use with a wearable healthcare system. It comprises an analog front end, a 12-bit ADC, a robust Instantaneous Heart Rate (IHR) monitor, a 32-bit Cortex-M0 core, and 64 Kbyte Ferroelectric Random Access Memory (FeRAM). The IHR monitor uses a short-term autocorrelation (STAC) algorithm to improve the heart-rate detection accuracy despite its use in noisy conditions. The ECG processor chip consumes 13.7 μA for heart rate logging application.
Testing Methods for Integrated Circuit Chips.
1986-03-27
DWf <I IAV ~IMi MORY OUT LOGIC~~ IPOGRAM ASYC S’E4i E...* 16o, CO% T ROL CO%TROL 32 Figure 2 . 14 VLSI Tester Block Diagram. registers, memory and test...neral-pIurpos’ processor wi th standard bus- inte-rfaco se-rves as,- th- test control Ii’r and ( 2 ) a c-ustom VLSI test Controller inti-rfacing direc(_t1...Engineering 2 WTWTY ABSTRACT Provision for the functional testing of fabricated VLSI chips frequently involves as much design effort as the orig- _ inal
Fault Tolerant Microcontroller for the Configurable Fault Tolerant Processor
2008-09-01
many others come to mind I also wish to thank Jan Grey for providing an excellent System-on-a-Chip that formed a core component of this thesis...developed by Jan Gray as documented in his "Building a RISC CPU and System-on-a-Chip in an FPGA" series of articles that was published in Circuit Cellar...those detailed by Jan Gray in his "Getting Started with the XSOC Project v0.93" [16]. The XSOC distribution is available at <http://www.fpgacpu.org
2006-06-14
Robert Graybill . A Raw hoard for the use of this project was provided by the Computer Architecture Croup at the Massachusetts Institute of Technology...simulator is presented by MIT as being an accurate model of the Raw chip, we have found that it does not accurately model the board. Our comparison...G4 processor, model 7410. with a 32 kbyte level-1 cache on-chip and a 2 Mbyte L2 cache connected through a 250 MH/ bus [12]. Each node has 256 Mbyte
NASA Astrophysics Data System (ADS)
Hadade, Ioan; di Mare, Luca
2016-08-01
Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Implementation of a cone-beam backprojection algorithm on the cell broadband engine processor
NASA Astrophysics Data System (ADS)
Bockenbach, Olivier; Knaup, Michael; Kachelrieß, Marc
2007-03-01
Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step. Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms. In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs. The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
A CAM-based LZ data compression IC
NASA Technical Reports Server (NTRS)
Winters, K.; Bode, R.; Schneider, E.
1993-01-01
A custom CMOS processor is introduced that implements the Data Compression Lempel-Ziv (DCLZ) standard, a variation of the LZ2 Algorithm. This component presently achieves a sustained compression and decompression rate of 10 megabytes/second by employing an on-chip content-addressable memory for string table storage.
Microprocessor design for GaAs technology
NASA Astrophysics Data System (ADS)
Milutinovic, Veljko M.
Recent advances in the design of GaAs microprocessor chips are examined in chapters contributed by leading experts; the work is intended as reading material for a graduate engineering course or as a practical R&D reference. Topics addressed include the methodology used for the architecture, organization, and design of GaAs processors; GaAs device physics and circuit design; design concepts for microprocessor-based GaAs systems; a 32-bit GaAs microprocessor; a 32-bit processor implemented in GaAs JFET; and a direct coupled-FET-logic E/D-MESFET experimental RISC machine. Drawings, micrographs, and extensive circuit diagrams are provided.
On-board computational efficiency in real time UAV embedded terrain reconstruction
NASA Astrophysics Data System (ADS)
Partsinevelos, Panagiotis; Agadakos, Ioannis; Athanasiou, Vasilis; Papaefstathiou, Ioannis; Mertikas, Stylianos; Kyritsis, Sarantis; Tripolitsiotis, Achilles; Zervos, Panagiotis
2014-05-01
In the last few years, there is a surge of applications for object recognition, interpretation and mapping using unmanned aerial vehicles (UAV). Specifications in constructing those UAVs are highly diverse with contradictory characteristics including cost-efficiency, carrying weight, flight time, mapping precision, real time processing capabilities, etc. In this work, a hexacopter UAV is employed for near real time terrain mapping. The main challenge addressed is to retain a low cost flying platform with real time processing capabilities. The UAV weight limitation affecting the overall flight time, makes the selection of the on-board processing components particularly critical. On the other hand, surface reconstruction, as a computational demanding task, calls for a highly demanding processing unit on board. To merge these two contradicting aspects along with customized development, a System on a Chip (SoC) integrated circuit is proposed as a low-power, low-cost processor, which natively supports camera sensors and positioning and navigation systems. Modern SoCs, such as Omap3530 or Zynq, are classified as heterogeneous devices and provide a versatile platform, allowing access to both general purpose processors, such as the ARM11, as well as specialized processors, such as a digital signal processor and floating field-programmable gate array. A UAV equipped with the proposed embedded processors, allows on-board terrain reconstruction using stereo vision in near real time. Furthermore, according to the frame rate required, additional image processing may concurrently take place, such as image rectification andobject detection. Lastly, the onboard positioning and navigation (e.g., GNSS) chip may further improve the quality of the generated map. The resulting terrain maps are compared to ground truth geodetic measurements in order to access the accuracy limitations of the overall process. It is shown that with our proposed novel system,there is much potential in computational efficiency on board and in optimized time constraints.
Cache Energy Optimization Techniques For Modern Processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
2013-01-01
Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In thismore » book, we present novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. Also, we present cache energy saving techniques for caches designed with both conventional SRAM devices and emerging non-volatile devices such as STT-RAM (spin-torque transfer RAM). We present software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with state-of-the-art techniques and have found that our techniques outperform them in terms of energy efficiency and other relevant metrics. The techniques presented in this book have important applications in improving energy-efficiency of higher-end embedded, desktop, QoS, real-time, server processors and multitasking systems. This book is intended to be a valuable guide for both newcomers and veterans in the field of cache power management. It will help graduate students, CAD tool developers and designers in understanding the need of energy efficiency in modern computing systems. Further, it will be useful for researchers in gaining insights into algorithms and techniques for micro-architectural and system-level energy optimization using dynamic cache reconfiguration. We sincerely believe that the ``food for thought'' presented in this book will inspire the readers to develop even better ideas for designing ``green'' processors of tomorrow.« less
Optical interconnection using polyimide waveguide for multichip module
NASA Astrophysics Data System (ADS)
Koyanagi, Mitsumasa
1996-01-01
We have developed a parallel processor system with 152 RISC processor chips specific for Monte-Carlo analysis. This system has the ring-bus architecture. The performance of several Gflops is expected in this system according to the computer simulation. However, it was revealed that the data transfer speed of the bus has to be increased more dramatically in order to further increase the performance. Then, we propose to introduce the optical interconnection into the parallel processor system to increase the data transfer speed of the buses. The double ringbus architecture is employed in this new parallel processor system with optical interconnection. The free-space optical interconnection arid the optical waveguide are used for the optical ring-bus. Thin polyimide film was used to form the optical waveguide. A relatively low propagation loss was achieved in the polyimide optical waveguide. In addition, it was confirmed that the propagation direction of signal light can be easily changed by using a micro-mirror.
Optical interconnection using polyimide waveguide for multichip module
NASA Astrophysics Data System (ADS)
Koyanagi, Mitsumasa
1996-01-01
We have developed a parallel processor system with 152 RISC processor chips specific for Monte-Carlo analysis. This system has the ring-bus architecture. The performance of several Gflops is expected in this system according to the computer simulation. However, it was revealed that the data transfer speed of the bus has to be increased more dramatically in order to further increase the performance. Then, we propose to introduce the optical interconnection into the parallel processor system to increase the data transfer speed of the buses. The double ring-bus architecture is employed in this new parallel processor system with optical interconnection. The free-space optical interconnection and the optical waveguide are used for the optical ring-bus. Thin polyimide film was used to form the optical waveguide. A relatively low propagation loss was achieved in the polyimide optical waveguide. In addition, it was confirmed that the propagation direction of signal light can be easily changed by using a micro-mirror.
Narrowband Interference Suppression in Spread Spectrum Communication Systems
1995-12-01
receiver input. As stated earlier, these waveforms must be sampled to obtain the discrete time sequences. The sampling theorem states: A bandlimited...From the FFT chips, the data is passed to a Plessey PDSP16330 Pythagoras Processor. The 16330 is a high-speed digital CMOS IC that converts real and
Temperature-Adaptive Circuits on Reconfigurable Analog Arrays
NASA Technical Reports Server (NTRS)
Stoica, Adrian; Zebulum, Ricardo S.; Keymeulen, Didier; Ramesham, Rajeshuni; Neff, Joseph; Katkoori, Srinivas
2006-01-01
Demonstration of a self-reconfigurable Integrated Circuit (IC) that would operate under extreme temperature (-180 C and 120 C) and radiation (300krad), without the protection of thermal controls and radiation shields. Self-Reconfigurable Electronics platform: a) Evolutionary Processor (EP) to run reconfiguration mechanism; b) Reconfigurable chip (FPGA, FPAA, etc).
Hybrid Quantum Information Processing with Superconductors and Neutral Atoms
NASA Astrophysics Data System (ADS)
McDermott, Robert
Hybrid approaches to quantum information processing (QIP) aim to capitalize on the strengths of disparate quantum technologies to realize a system whose capabilities exceed those of any single experimental platform. At the University of Wisconsin, we are working toward integration of a fast superconducting quantum processor with a stable, long-lived quantum memory based on trapped neutral atoms. Here we describe the development of a quantum interface between superconducting thin-film cavity circuits and trapped Rydberg atoms, the key technological obstacle to realization of superconductor-atom hybrid QIP. Specific accomplishments to date include development of a theoretical protocol for high-fidelity state transfer between the atom and the cavity; fabrication and characterization of high- Q superconducting cavities with integrated trapping electrodes to enhance zero-point microwave fields at a location remote from the chip surface; and trapping and Rydberg excitation of single atoms within 1 mm of the cavity. We discuss the status of experiments to probe the strong coherent coupling of single Rydberg atoms and the superconducting cavity. Supported by ARO under contract W911NF-16-1-0133.
Zhang, Jingyuan Linda; Lagoudakis, Konstantinos G.; Tzeng, Yan -Kai; ...
2017-10-23
Arrays of identical and individually addressable qubits lay the foundation for the creation of scalable quantum hardware such as quantum processors and repeaters. Silicon-vacancy (SiV) centers in diamond offer excellent physical properties such as low inhomogeneous broadening, fast photon emission, and a large Debye–Waller factor. The possibility for all-optical ultrafast manipulation and techniques to extend the spin coherence times makes them promising candidates for qubits. Here, we have developed arrays of nanopillars containing single (SiV) centers with high yield, and we demonstrate ultrafast all-optical complete coherent control of the excited state population of a single SiV center at the opticalmore » transition frequency. The high quality of the chemical vapor deposition (CVD) grown SiV centers provides excellent spectral stability, which allows us to coherently manipulate and quasi-resonantly read out the excited state population of individual SiV centers on picosecond timescales using ultrafast optical pulses. Furthermore, this work opens new opportunities to create a scalable on-chip diamond platform for quantum information processing and scalable nanophotonics applications.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jingyuan Linda; Lagoudakis, Konstantinos G.; Tzeng, Yan -Kai
Arrays of identical and individually addressable qubits lay the foundation for the creation of scalable quantum hardware such as quantum processors and repeaters. Silicon-vacancy (SiV) centers in diamond offer excellent physical properties such as low inhomogeneous broadening, fast photon emission, and a large Debye–Waller factor. The possibility for all-optical ultrafast manipulation and techniques to extend the spin coherence times makes them promising candidates for qubits. Here, we have developed arrays of nanopillars containing single (SiV) centers with high yield, and we demonstrate ultrafast all-optical complete coherent control of the excited state population of a single SiV center at the opticalmore » transition frequency. The high quality of the chemical vapor deposition (CVD) grown SiV centers provides excellent spectral stability, which allows us to coherently manipulate and quasi-resonantly read out the excited state population of individual SiV centers on picosecond timescales using ultrafast optical pulses. Furthermore, this work opens new opportunities to create a scalable on-chip diamond platform for quantum information processing and scalable nanophotonics applications.« less
Single chip camera active pixel sensor
NASA Technical Reports Server (NTRS)
Shaw, Timothy (Inventor); Pain, Bedabrata (Inventor); Olson, Brita (Inventor); Nixon, Robert H. (Inventor); Fossum, Eric R. (Inventor); Panicacci, Roger A. (Inventor); Mansoorian, Barmak (Inventor)
2003-01-01
A totally digital single chip camera includes communications to operate most of its structure in serial communication mode. The digital single chip camera include a D/A converter for converting an input digital word into an analog reference signal. The chip includes all of the necessary circuitry for operating the chip using a single pin.
Embedded System Implementation on FPGA System With μCLinux OS
NASA Astrophysics Data System (ADS)
Fairuz Muhd Amin, Ahmad; Aris, Ishak; Syamsul Azmir Raja Abdullah, Raja; Kalos Zakiah Sahbudin, Ratna
2011-02-01
Embedded systems are taking on more complicated tasks as the processors involved become more powerful. The embedded systems have been widely used in many areas such as in industries, automotives, medical imaging, communications, speech recognition and computer vision. The complexity requirements in hardware and software nowadays need a flexibility system for further enhancement in any design without adding new hardware. Therefore, any changes in the design system will affect the processor that need to be changed. To overcome this problem, a System On Programmable Chip (SOPC) has been designed and developed using Field Programmable Gate Array (FPGA). A softcore processor, NIOS II 32-bit RISC, which is the microprocessor core was utilized in FPGA system together with the embedded operating system(OS), μClinux. In this paper, an example of web server is explained and demonstrated
ELIPS: Toward a Sensor Fusion Processor on a Chip
NASA Technical Reports Server (NTRS)
Daud, Taher; Stoica, Adrian; Tyson, Thomas; Li, Wei-te; Fabunmi, James
1998-01-01
The paper presents the concept and initial tests from the hardware implementation of a low-power, high-speed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) processor is developed to seamlessly combine rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor in compact low power VLSI. The first demonstration of the ELIPS concept targets interceptor functionality; other applications, mainly in robotics and autonomous systems are considered for the future. The main assumption behind ELIPS is that fuzzy, rule-based and neural forms of computation can serve as the main primitives of an "intelligent" processor. Thus, in the same way classic processors are designed to optimize the hardware implementation of a set of fundamental operations, ELIPS is developed as an efficient implementation of computational intelligence primitives, and relies on a set of fuzzy set, fuzzy inference and neural modules, built in programmable analog hardware. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Following software demonstrations on several interceptor data, three important ELIPS building blocks (a fuzzy set preprocessor, a rule-based fuzzy system and a neural network) have been fabricated in analog VLSI hardware and demonstrated microsecond-processing times.
An 81.6 μW FastICA processor for epileptic seizure detection.
Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming
2015-02-01
To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
Design of the ANTARES LCM-DAQ board test bench using a FPGA-based system-on-chip approach
NASA Astrophysics Data System (ADS)
Anvar, S.; Kestener, P.; Le Provost, H.
2006-11-01
The System-on-Chip (SoC) approach consists in using state-of-the-art FPGA devices with embedded RISC processor cores, high-speed differential LVDS links and ready-to-use multi-gigabit transceivers allowing development of compact systems with substantial number of IO channels. Required performances are obtained through a subtle separation of tasks between closely cooperating programmable hardware logic and user-friendly software environment. We report about our experience in using the SoC approach for designing the production test bench of the off-shore readout system for the ANTARES neutrino experiment.
Rubus: A compiler for seamless and extensible parallelism.
Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism
Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
[A novel biologic electricity signal measurement based on neuron chip].
Lei, Yinsheng; Wang, Mingshi; Sun, Tongjing; Zhu, Qiang; Qin, Ran
2006-06-01
Neuron chip is a multiprocessor with three pipeline CPU; its communication protocol and control processor are integrated in effect to carry out the function of communication, control, attemper, I/O, etc. A novel biologic electronic signal measurement network system is composed of intelligent measurement nodes with neuron chip at the core. In this study, the electronic signals such as ECG, EEG, EMG and BOS can be synthetically measured by those intelligent nodes, and some valuable diagnostic messages are found. Wavelet transform is employed in this system to analyze various biologic electronic signals due to its strong time-frequency ability of decomposing signal local character. Better effect is gained. This paper introduces the hardware structure of network and intelligent measurement node, the measurement theory and the signal figure of data acquisition and processing.
Another expert system rule inference based on DNA molecule logic gates
NASA Astrophysics Data System (ADS)
WÄ siewicz, Piotr
2013-10-01
With the help of silicon industry microfluidic processors were invented utilizing nano membrane valves, pumps and microreactors. These so called lab-on-a-chips combined together with molecular computing create molecular-systems-ona- chips. This work presents a new approach to implementation of molecular inference systems. It requires the unique representation of signals by DNA molecules. The main part of this work includes the concept of logic gates based on typical genetic engineering reactions. The presented method allows for constructing logic gates with many inputs and for executing them at the same quantity of elementary operations, regardless of a number of input signals. Every microreactor of the lab-on-a-chip performs one unique operation on input molecules and can be connected by dataflow output-input connections to other ones.
2009-12-24
Networks Silicon-Photonic Clos Networks for Global On-Chip Communication Ajay Joshi* Christopher Batten? Yong-Jin Kwon! Scott Beamer! Imran Shamim ...4th edition, 2007. •A\\ [13] A Joshi, C Batten, Y Kwon, S Beamer, Imran Shamim , Krste Asanovic, and Vladimir Sto- janovic. Silicon-photonic clos
Optoelectronic interconnects for 3D wafer stacks
NASA Astrophysics Data System (ADS)
Ludwig, David E.; Carson, John C.; Lome, Louis S.
1996-01-01
Wafer and chip stacking are envisioned as a means of providing increased processing power within the small confines of a three-dimensional structure. Optoelectronic devices can play an important role in these dense 3-D processing electronic packages in two ways. In pure electronic processing, optoelectronics can provide a method for increasing the number of input/output communication channels within the layers of the 3-D chip stack. Non-free space communication links allow the density of highly parallel input/output ports to increase dramatically over typical edge bus connections. In hybrid processors, where electronics and optics play a role in defining the computational algorithm, free space communication links are typically utilized for, among other reasons, the increased network link complexity which can be achieved. Free space optical interconnections provide bandwidths and interconnection complexity unobtainable in pure electrical interconnections. Stacked 3-D architectures can provide the electronics real estate and structure to deal with the increased bandwidth and global information provided by free space optical communications. This paper provides definitions and examples of 3-D stacked architectures in optoelectronics processors. The benefits and issues of these technologies are discussed.
Optoelectronic interconnects for 3D wafer stacks
NASA Astrophysics Data System (ADS)
Ludwig, David; Carson, John C.; Lome, Louis S.
1996-01-01
Wafer and chip stacking are envisioned as means of providing increased processing power within the small confines of a three-dimensional structure. Optoelectronic devices can play an important role in these dense 3-D processing electronic packages in two ways. In pure electronic processing, optoelectronics can provide a method for increasing the number of input/output communication channels within the layers of the 3-D chip stack. Non-free space communication links allow the density of highly parallel input/output ports to increase dramatically over typical edge bus connections. In hybrid processors, where electronics and optics play a role in defining the computational algorithm, free space communication links are typically utilized for, among other reasons, the increased network link complexity which can be achieved. Free space optical interconnections provide bandwidths and interconnection complexity unobtainable in pure electrical interconnections. Stacked 3-D architectures can provide the electronics real estate and structure to deal with the increased bandwidth and global information provided by free space optical communications. This paper will provide definitions and examples of 3-D stacked architectures in optoelectronics processors. The benefits and issues of these technologies will be discussed.
Image processing using Gallium Arsenide (GaAs) technology
NASA Technical Reports Server (NTRS)
Miller, Warner H.
1989-01-01
The need to increase the information return from space-borne imaging systems has increased in the past decade. The use of multi-spectral data has resulted in the need for finer spatial resolution and greater spectral coverage. Onboard signal processing will be necessary in order to utilize the available Tracking and Data Relay Satellite System (TDRSS) communication channel at high efficiency. A generally recognized approach to the increased efficiency of channel usage is through data compression techniques. The compression technique implemented is a differential pulse code modulation (DPCM) scheme with a non-uniform quantizer. The need to advance the state-of-the-art of onboard processing was recognized and a GaAs integrated circuit technology was chosen. An Adaptive Programmable Processor (APP) chip set was developed which is based on an 8-bit slice general processor. The reason for choosing the compression technique for the Multi-spectral Linear Array (MLA) instrument is described. Also a description is given of the GaAs integrated circuit chip set which will demonstrate that data compression can be performed onboard in real time at data rate in the order of 500 Mb/s.
Impact of device level faults in a digital avionic processor
NASA Technical Reports Server (NTRS)
Suk, Ho Kim
1989-01-01
This study describes an experimental analysis of the impact of gate and device-level faults in the processor of a Bendix BDX-930 flight control system. Via mixed mode simulation, faults were injected at the gate (stuck-at) and at the transistor levels and, their propagation through the chip to the output pins was measured. The results show that there is little correspondence between a stuck-at and a device-level fault model, as far as error activity or detection within a functional unit is concerned. In so far as error activity outside the injected unit and at the output pins are concerned, the stuck-at and device models track each other. The stuck-at model, however, overestimates, by over 100 percent, the probability of fault propagation to the output pins. An evaluation of the Mean Error Durations and the Mean Time Between Errors at the output pins shows that the stuck-at model significantly underestimates (by 62 percent) the impact of an internal chip fault on the output pins. Finally, the study also quantifies the impact of device fault by location, both internally and at the output pins.
Kang, Jeeun; Yoon, Changhan; Lee, Jaejin; Kye, Sang-Bum; Lee, Yongbae; Chang, Jin Ho; Kim, Gi-Duck; Yoo, Yangmo; Song, Tai-kyong
2016-04-01
In this paper, we present a novel system-on-chip (SOC) solution for a portable ultrasound imaging system (PUS) for point-of-care applications. The PUS-SOC includes all of the signal processing modules (i.e., the transmit and dynamic receive beamformer modules, mid- and back-end processors, and color Doppler processors) as well as an efficient architecture for hardware-based imaging methods (e.g., dynamic delay calculation, multi-beamforming, and coded excitation and compression). The PUS-SOC was fabricated using a UMC 130-nm NAND process and has 16.8 GFLOPS of computing power with a total equivalent gate count of 12.1 million, which is comparable to a Pentium-4 CPU. The size and power consumption of the PUS-SOC are 27×27 mm(2) and 1.2 W, respectively. Based on the PUS-SOC, a prototype hand-held US imaging system was implemented. Phantom experiments demonstrated that the PUS-SOC can provide appropriate image quality for point-of-care applications with a compact PDA size ( 200×120×45 mm(3)) and 3 hours of battery life.
NbN A/D Conversion of IR Focal Plane Sensor Signal at 10 K
NASA Technical Reports Server (NTRS)
Eaton, L.; Durand, D.; Sandell, R.; Spargo, J.; Krabach, T.
1994-01-01
We are implementing a 12 bit SFQ counting ADC with parallel-to-serial readout using our established 10 K NbN capability. This circuit provides a key element of the analog signal processor (ASP) used in large infrared focal plane arrays. The circuit processes the signal data stream from a Si:As BIB detector array. A 10 mega samples per second (MSPS) pixel data stream flows from the chip at a 120 megabit bit rate in a format that is compatible with other superconductive time dependent processor (TDP) circuits being developed. We will discuss our planned ASP demonstration, the circuit design, and test results.
Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong
2013-10-01
A processor-implemented method for determining aging of a processing unit in a processor the method comprising: calculating an effective aging profile for the processing unit wherein the effective aging profile quantifies the effects of aging on the processing unit; combining the effective aging profile with process variation data, actual workload data and operating conditions data for the processing unit; and determining aging through an aging sensor of the processing unit using the effective aging profile, the process variation data, the actual workload data, architectural characteristics and redundancy data, and the operating conditions data for the processing unit.
NASA Technical Reports Server (NTRS)
Premkumar, A. B.; Purviance, J. E.
1990-01-01
A simplified model for the SAR imaging problem is presented. The model is based on the geometry of the SAR system. Using this model an expression for the entire phase history of the received SAR signal is formulated. From the phase history, it is shown that the range and the azimuth coordinates for a point target image can be obtained by processing the phase information during the intrapulse and interpulse periods respectively. An architecture for a VLSI implementation for the SAR signal processor is presented which generates images in real time. The architecture uses a small number of chips, a new correlation processor, and an efficient azimuth correlation process.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Bonfanti, A; Ceravolo, M; Zambra, G; Gusmeroli, R; Spinelli, A S; Lacaita, A L; Angotzi, G N; Baranauskas, G; Fadiga, L
2010-01-01
This paper reports a multi-channel neural recording system-on-chip (SoC) with digital data compression and wireless telemetry. The circuit consists of a 16 amplifiers, an analog time division multiplexer, an 8-bit SAR AD converter, a digital signal processor (DSP) and a wireless narrowband 400-MHz binary FSK transmitter. Even though only 16 amplifiers are present in our current die version, the whole system is designed to work with 64 channels demonstrating the feasibility of a digital processing and narrowband wireless transmission of 64 neural recording channels. A digital data compression, based on the detection of action potentials and storage of correspondent waveforms, allows the use of a 1.25-Mbit/s binary FSK wireless transmission. This moderate bit-rate and a low frequency deviation, Manchester-coded modulation are crucial for exploiting a narrowband wireless link and an efficient embeddable antenna. The chip is realized in a 0.35- εm CMOS process with a power consumption of 105 εW per channel (269 εW per channel with an extended transmission range of 4 m) and an area of 3.1 × 2.7 mm(2). The transmitted signal is captured by a digital TV tuner and demodulated by a wideband phase-locked loop (PLL), and then sent to a PC via an FPGA module. The system has been tested for electrical specifications and its functionality verified in in-vivo neural recording experiments.
Zhuang, Leimeng; Taddei, Caterina; Hoekman, Marcel; Leinse, Arne; Heideman, René; van Dijk, Paulus; Roeloffzen, Chris
2013-11-04
In this paper, we propose and experimentally demonstrate a novel wideband on-chip photonic modulation transformer for phase-modulated microwave photonic links. The proposed device is able to transform phase-modulated optical signals into intensity-modulated versions (or vice versa) with nearly zero conversion of laser phase noise to intensity noise. It is constructed using waveguide-based ring resonators, which features simple architecture, stable operation, and easy reconfigurability. Beyond the stand-alone functionality, the proposed device can also be integrated with other functional building blocks of photonic integrated circuits (PICs) to create on-chip complex microwave photonic signal processors. As an application example, a PIC consisting of two such modulation transformers and a notch filter has been designed and realized in TriPleX(TM) waveguide technology. The realized device uses a 2 × 2 splitting circuit and 3 ring resonators with a free spectral range of 25 GHz, which are all equipped with continuous tuning elements. The device can perform phase-to-intensity modulation transform and carrier suppression simultaneously, which enables high-performance phase-modulated microwave photonics links (PM-MPLs). Associated with the bias-free and low-complexity advantages of the phase modulators, a single-fiber-span PM-MPL with a RF bandwidth of 12 GHz (3 dB-suppression band 6 to 18 GHz) has been demonstrated comprising the proposed PIC, where the achieved spurious-free dynamic range performance is comparable to that of Class-AB MPLs using low-biased Mach-Zehnder modulators.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.
Scharfe, Michael; Pielot, Rainer; Schreiber, Falk
2010-01-11
Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-12
... processor. In addition, the Amendment modifies the proposals so that a market maker's quoting obligations... reported by the responsible single plan processor. Finally, so that the markets may coordinate..., as reported by the responsible single plan processor. The Amendment also modifies that the market...
Dense and Sparse Matrix Operations on the Cell Processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel W.; Shalf, John; Oliker, Leonid
2005-05-01
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
Efficient Sorting on the Tilera Manycore Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Villa, Oreste
e present an efficient implementation of the radix sort algo- rithm for the Tilera TILEPro64 processor. The TILEPro64 is one of the first successful commercial manycore processors. It is com- posed of 64 tiles interconnected through multiple fast Networks- on-chip and features a fully coherent, shared distributed cache. The architecture has a large degree of flexibility, and allows various optimization strategies. We describe how we mapped the algorithm to this architecture. We present an in-depth analysis of the optimizations for each phase of the algorithm with respect to the processor’s sustained performance. We discuss the overall throughput reached by ourmore » radix sort implementation (up to 132 MK/s) and show that it provides comparable or better performance-per-watt with respect to state-of-the art implemen- tations on x86 processors and graphic processing units.« less
Real-time road detection in infrared imagery
NASA Astrophysics Data System (ADS)
Andre, Haritini E.; McCoy, Keith
1990-09-01
Automatic road detection is an important part in many scene recognition applications. The extraction of roads provides a means of navigation and position update for remotely piloted vehicles or autonomous vehicles. Roads supply strong contextual information which can be used to improve the performance of automatic target recognition (ATh) systems by directing the search for targets and adjusting target classification confidences. This paper will describe algorithmic techniques for labeling roads in high-resolution infrared imagery. In addition, realtime implementation of this structural approach using a processor array based on the Martin Marietta Geometric Arithmetic Parallel Processor (GAPPTh) chip will be addressed. The algorithm described is based on the hypothesis that a road consists of pairs of line segments separated by a distance "d" with opposite gradient directions (antiparallel). The general nature of the algorithm, in addition to its parallel implementation in a single instruction, multiple data (SIMD) machine, are improvements to existing work. The algorithm seeks to identify line segments meeting the road hypothesis in a manner that performs well, even when the side of the road is fragmented due to occlusion or intersections. The use of geometrical relationships between line segments is a powerful yet flexible method of road classification which is independent of orientation. In addition, this approach can be used to nominate other types of objects with minor parametric changes.
1986-06-30
features of computer aided design systems and statistical quality control procedures that are generic to chip sets and processes. RADIATION HARDNESS -The...System PSP Programmable Signal Processor SSI Small Scale Integration ." TOW Tube Launched, Optically Tracked, Wire Guided TTL Transistor Transitor Logic
Comparison of mechanized systems for thinning Ponderosa pine and mixed conifer stands
Bruce R. Hartsough; Joseph F. McNeel; Thomas A. Durston; Bryce J. Stokes
1994-01-01
We studied three systems for thinning pine plantations and naturally-regenerated stands on the Stanislaus National Forest, California. All three produced small sawlogs and fuel chips. The whole tree system consisted of a feller buncher, skidder, stroke processor, loader and chipper. The cut-to-length system included a harvester, forwarder, loader and chipper. A hybrid...
Comparison of mechanized systems for thinning Ponderosa pine and mixed conifer stands
Bruce R. Hartsough; Joseph F. McNeel; Thomas A. Durston; Bryce J. Stokes
1994-01-01
Three systems for thinning pine plantations and naturally-regenerated stands were studied. All three produced small sawlogs and fuel chips. The whole-tree system consisted of a feller buncher, skidder, stroke processor, loader, and chipper. The cut-to-length system included a harvester, forwarder, loader, and chipper. A hybrid system combined a feller buncher,...
Key Technologies of Phone Storage Forensics Based on ARM Architecture
NASA Astrophysics Data System (ADS)
Zhang, Jianghan; Che, Shengbing
2018-03-01
Smart phones are mainly running Android, IOS and Windows Phone three mobile platform operating systems. The android smart phone has the best market shares and its processor chips are almost ARM software architecture. The chips memory address mapping mechanism of ARM software architecture is different with x86 software architecture. To forensics to android mart phone, we need to understand three key technologies: memory data acquisition, the conversion mechanism from virtual address to the physical address, and find the system’s key data. This article presents a viable solution which does not rely on the operating system API for a complete solution to these three issues.
Research based on the SoPC platform of feature-based image registration
NASA Astrophysics Data System (ADS)
Shi, Yue-dong; Wang, Zhi-hui
2015-12-01
This paper focuses on the study of implementing feature-based image registration by System on a Programmable Chip (SoPC) hardware platform. We solidify the image registration algorithm on the FPGA chip, in which embedded soft core processor Nios II can speed up the image processing system. In this way, we can make image registration technology get rid of the PC. And, consequently, this kind of technology will be got an extensive use. The experiment result indicates that our system shows stable performance, particularly in terms of matching processing which noise immunity is good. And feature points of images show a reasonable distribution.
LLL 8080 BASIC-II interpreter user's manual
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGoldrick, P.R.; Dickinson, J.; Allison, T.G.
1978-04-03
Scientists are finding increased applications for microprocessors as process controllers in their experiments. However, while microprocessors are small and inexpensive, they are difficult to program in machine or assembly language. A high-level language is needed to enable scientists to develop their own microcomputer programs for their experiments on location. Recognizing this need, LLL contracted to have such a language developed. This report describes the resulting LLL BASIC interpreter, which opeates with LLL's 8080-based MCS-8 microcomputer system. All numerical operations are done using Advanced Micro Device's Am9511 arithmetic processor chip or optionally by using a software simulation of that chip. 1more » figure.« less
FPGA implementation of ICA algorithm for blind signal separation and adaptive noise canceling.
Kim, Chang-Min; Park, Hyung-Min; Kim, Taesu; Choi, Yoon-Kyung; Lee, Soo-Young
2003-01-01
An field programmable gate array (FPGA) implementation of independent component analysis (ICA) algorithm is reported for blind signal separation (BSS) and adaptive noise canceling (ANC) in real time. In order to provide enormous computing power for ICA-based algorithms with multipath reverberation, a special digital processor is designed and implemented in FPGA. The chip design fully utilizes modular concept and several chips may be put together for complex applications with a large number of noise sources. Experimental results with a fabricated test board are reported for ANC only, BSS only, and simultaneous ANC/BSS, which demonstrates successful speech enhancement in real environments in real time.
Single cell digital polymerase chain reaction on self-priming compartmentalization chip
Zhu, Qiangyuan; Qiu, Lin; Xu, Yanan; Li, Guang; Mu, Ying
2017-01-01
Single cell analysis provides a new framework for understanding biology and disease, however, an absolute quantification of single cell gene expression still faces many challenges. Microfluidic digital polymerase chain reaction (PCR) provides a unique method to absolutely quantify the single cell gene expression, but only limited devices are developed to analyze a single cell with detection variation. This paper describes a self-priming compartmentalization (SPC) microfluidic digital polymerase chain reaction chip being capable of performing single molecule amplification from single cell. The chip can be used to detect four single cells simultaneously with 85% of sample digitization. With the optimized protocol for the SPC chip, we first tested the ability, precision, and sensitivity of our SPC digital PCR chip by assessing β-actin DNA gene expression in 1, 10, 100, and 1000 cells. And the reproducibility of the SPC chip is evaluated by testing 18S rRNA of single cells with 1.6%–4.6% of coefficient of variation. At last, by detecting the lung cancer related genes, PLAU gene expression of A549 cells at the single cell level, the single cell heterogeneity was demonstrated. So, with the power-free, valve-free SPC chip, the gene copy number of single cells can be quantified absolutely with higher sensitivity, reduced labor time, and reagent. We expect that this chip will enable new studies for biology and disease. PMID:28191267
Single cell digital polymerase chain reaction on self-priming compartmentalization chip.
Zhu, Qiangyuan; Qiu, Lin; Xu, Yanan; Li, Guang; Mu, Ying
2017-01-01
Single cell analysis provides a new framework for understanding biology and disease, however, an absolute quantification of single cell gene expression still faces many challenges. Microfluidic digital polymerase chain reaction (PCR) provides a unique method to absolutely quantify the single cell gene expression, but only limited devices are developed to analyze a single cell with detection variation. This paper describes a self-priming compartmentalization (SPC) microfluidic digital polymerase chain reaction chip being capable of performing single molecule amplification from single cell. The chip can be used to detect four single cells simultaneously with 85% of sample digitization. With the optimized protocol for the SPC chip, we first tested the ability, precision, and sensitivity of our SPC digital PCR chip by assessing β-actin DNA gene expression in 1, 10, 100, and 1000 cells. And the reproducibility of the SPC chip is evaluated by testing 18S rRNA of single cells with 1.6%-4.6% of coefficient of variation. At last, by detecting the lung cancer related genes, PLAU gene expression of A549 cells at the single cell level, the single cell heterogeneity was demonstrated. So, with the power-free, valve-free SPC chip, the gene copy number of single cells can be quantified absolutely with higher sensitivity, reduced labor time, and reagent. We expect that this chip will enable new studies for biology and disease.
Development of small scale cluster computer for numerical analysis
NASA Astrophysics Data System (ADS)
Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.
2017-09-01
In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
From neural-based object recognition toward microelectronic eyes
NASA Technical Reports Server (NTRS)
Sheu, Bing J.; Bang, Sa Hyun
1994-01-01
Engineering neural network systems are best known for their abilities to adapt to the changing characteristics of the surrounding environment by adjusting system parameter values during the learning process. Rapid advances in analog current-mode design techniques have made possible the implementation of major neural network functions in custom VLSI chips. An electrically programmable analog synapse cell with large dynamic range can be realized in a compact silicon area. New designs of the synapse cells, neurons, and analog processor are presented. A synapse cell based on Gilbert multiplier structure can perform the linear multiplication for back-propagation networks. A double differential-pair synapse cell can perform the Gaussian function for radial-basis network. The synapse cells can be biased in the strong inversion region for high-speed operation or biased in the subthreshold region for low-power operation. The voltage gain of the sigmoid-function neurons is externally adjustable which greatly facilitates the search of optimal solutions in certain networks. Various building blocks can be intelligently connected to form useful industrial applications. Efficient data communication is a key system-level design issue for large-scale networks. We also present analog neural processors based on perceptron architecture and Hopfield network for communication applications. Biologically inspired neural networks have played an important role towards the creation of powerful intelligent machines. Accuracy, limitations, and prospects of analog current-mode design of the biologically inspired vision processing chips and cellular neural network chips are key design issues.
Tolbert, Jeremy R; Kabali, Pratik; Brar, Simeranjit; Mukhopadhyay, Saibal
2009-01-01
We present a digital system for adaptive data compression for low power wireless transmission of Electroencephalography (EEG) data. The proposed system acts as a base-band processor between the EEG analog-to-digital front-end and RF transceiver. It performs a real-time accuracy energy trade-off for multi-channel EEG signal transmission by controlling the volume of transmitted data. We propose a multi-core digital signal processor for on-chip processing of EEG signals, to detect signal information of each channel and perform real-time adaptive compression. Our analysis shows that the proposed approach can provide significant savings in transmitter power with minimal impact on the overall signal accuracy.
Theorem Proving in Intel Hardware Design
NASA Technical Reports Server (NTRS)
O'Leary, John
2009-01-01
For the past decade, a framework combining model checking (symbolic trajectory evaluation) and higher-order logic theorem proving has been in production use at Intel. Our tools and methodology have been used to formally verify execution cluster functionality (including floating-point operations) for a number of Intel products, including the Pentium(Registered TradeMark)4 and Core(TradeMark)i7 processors. Hardware verification in 2009 is much more challenging than it was in 1999 - today s CPU chip designs contain many processor cores and significant firmware content. This talk will attempt to distill the lessons learned over the past ten years, discuss how they apply to today s problems, outline some future directions.
Beyond core count: a look at new mainstream computing platforms for HEP workloads
NASA Astrophysics Data System (ADS)
Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.
2014-06-01
As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
Spacewire on Earth orbiting scatterometers
NASA Technical Reports Server (NTRS)
Bachmann, Alex; Lang, Minh; Lux, James; Steffke, Richard
2002-01-01
The need for a high speed, reliable and easy to implement communication link has led to the development of a space flight oriented version of IEEE 1355 called SpaceWire. SpaceWire is based on high-speed (200 Mbps) serial point-to-point links using Low Voltage Differential Signaling (LVDS). SpaceWIre has provisions for routing messages between a large network of processors, using wormhole routing for low overhead and latency. {additionally, there are available space qualified hybrids, which provide the Link layer to the user's bus}. A test bed of multiple digital signal processor breadboards, demonstrating the ability to meet signal processing requirements for an orbiting scatterometer has been implemented using three Astrium MCM-DSPs, each breadboard consists of a Multi Chip Module (MCM) that combines a space qualified Digital Signal Processor and peripherals, including IEEE-1355 links. With the addition of appropriate physical layer interfaces and software on the DSP, the SpaceWire link is used to communicate between processors on the test bed, e.g. sending timing references, commands, status, and science data among the processors. Results are presented on development issues surrounding the use of SpaceWire in this environment, from physical layer implementation (cables, connectors, LVDS drivers) to diagnostic tools, driver firmware, and development methodology. The tools, methods, and hardware, software challenges and preliminary performance are investigated and discussed.
Yamamura, Shohei; Yamada, Eriko; Kimura, Fukiko; Miyajima, Kumiko; Shigeto, Hajime
2017-10-21
A new single-cell microarray chip was designed and developed to separate and analyze single adherent and non-adherent cancer cells. The single-cell microarray chip is made of polystyrene with over 60,000 microchambers of 10 different size patterns (31-40 µm upper diameter, 11-20 µm lower diameter). A drop of suspension of adherent carcinoma (NCI-H1650) and non-adherent leukocyte (CCRF-CEM) cells was placed onto the chip, and single-cell occupancy of NCI-H1650 and CCRF-CEM was determined to be 79% and 84%, respectively. This was achieved by controlling the chip design and surface treatment. Analysis of protein expression in single NCI-H1650 and CCRF-CEM cells was performed on the single-cell microarray chip by multi-antibody staining. Additionally, with this system, we retrieved positive single cells from the microchambers by a micromanipulator. Thus, this system demonstrates the potential for easy and accurate separation and analysis of various types of single cells.
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.
Zierke, Stephanie; Bakos, Jason D
2010-04-12
Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Photonic-Networks-on-Chip for High Performance Radiation Survivable Multi-Core Processor Systems
2013-12-01
Loss Spectra” Proceedings of SPIE 8255, (2012) and in a journal publication: M. T. Crowley, D. Murrell, N. Patel, M. Breivik , C.-Y. Lin, Y. Li, B.-O...Crowley, D. Murrell, N. Patel, M. Breivik , C.-Y. Lin, Y. Li, B.-O. Fimland and L. F. Lester, "Analytical Modeling of the Temperature Performance of
An ultra-compact processor module based on the R3000
NASA Astrophysics Data System (ADS)
Mullenhoff, D. J.; Kaschmitter, J. L.; Lyke, J. C.; Forman, G. A.
1992-08-01
Viable high density packaging is of critical importance for future military systems, particularly space borne systems which require minimum weight and size and high mechanical integrity. A leading, emerging technology for high density packaging is multi-chip modules (MCM). During the 1980's, a number of different MCM technologies have emerged. In support of Strategic Defense Initiative Organization (SDIO) programs, Lawrence Livermore National Laboratory (LLNL) has developed, utilized, and evaluated several different MCM technologies. Prior LLNL efforts include modules developed in 1986, using hybrid wafer scale packaging, which are still operational in an Air Force satellite mission. More recent efforts have included very high density cache memory modules, developed using laser pantography. As part of the demonstration effort, LLNL and Phillips Laboratory began collaborating in 1990 in the Phase 3 Multi-Chip Module (MCM) technology demonstration project. The goal of this program was to demonstrate the feasibility of General Electric's (GE) High Density Interconnect (HDI) MCM technology. The design chosen for this demonstration was the processor core for a MIPS R3000 based reduced instruction set computer (RISC), which has been described previously. It consists of the R3000 microprocessor, R3010 floating point coprocessor and 128 Kbytes of cache memory.
NASA Astrophysics Data System (ADS)
Suarez, Hernan; Zhang, Yan R.
2015-05-01
New radar applications need to perform complex algorithms and process large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression for real-time transceiver optimization are presented, they are based on a System-on-Chip architecture for Xilinx devices. This study also evaluates the performance of dedicated coprocessor as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through the high performance AXI buses, to perform floating-point operations, control the processing blocks, and communicate with external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band tested together with a low-cost channel emulator for different types of waveforms.
On-Chip Waveguide Coupling of a Layered Semiconductor Single-Photon Source.
Tonndorf, Philipp; Del Pozo-Zamudio, Osvaldo; Gruhler, Nico; Kern, Johannes; Schmidt, Robert; Dmitriev, Alexander I; Bakhtinov, Anatoly P; Tartakovskii, Alexander I; Pernice, Wolfram; Michaelis de Vasconcellos, Steffen; Bratschitsch, Rudolf
2017-09-13
Fully integrated quantum technology based on photons is in the focus of current research, because of its immense potential concerning performance and scalability. Ideally, the single-photon sources, the processing units, and the photon detectors are all combined on a single chip. Impressive progress has been made for on-chip quantum circuits and on-chip single-photon detection. In contrast, nonclassical light is commonly coupled onto the photonic chip from the outside, because presently only few integrated single-photon sources exist. Here, we present waveguide-coupled single-photon emitters in the layered semiconductor gallium selenide as promising on-chip sources. GaSe crystals with a thickness below 100 nm are placed on Si 3 N 4 rib or slot waveguides, resulting in a modified mode structure efficient for light coupling. Using optical excitation from within the Si 3 N 4 waveguide, we find nonclassicality of generated photons routed on the photonic chip. Thus, our work provides an easy-to-implement and robust light source for integrated quantum technology.
NASA Astrophysics Data System (ADS)
Johnsson, L.; Netzer, G.
2016-10-01
Moore's law, the doubling of transistors per unit area for each CMOS technology generation, is expected to continue throughout the decade, while Dennard voltage scaling resulting in constant power per unit area stopped about a decade ago. The semiconductor industry's response to the loss of Dennard scaling and the consequent challenges in managing power distribution and dissipation has been leveled off clock rates, a die performance gain reduced from about a factor of 2.8 to 1.4 per technology generation, and multi-core processor dies with increased cache sizes. Increased caches sizes offers performance benefits for many applications as well as energy savings. Accessing data in cache is considerably more energy efficient than main memory accesses. Further, caches consume less power than a corresponding amount of functional logic. As feature sizes continue to be scaled down an increasing fraction of the die must be “underutilized” or “dark” due to power constraints. With power being a prime design constraint there is a concerted effort to find significantly more energy efficient chip architectures than dominant in servers today, with chips potentially incorporating several types of cores to cover a range of applications, or different functions in an application, as is already common for the mobile processor market. Digital Signal Processors (DSPs), largely targeting the embedded and mobile processor markets, typically have been designed for a power consumption of 10% or less of a typical x86 CPU, yet with much more than 10% of the floating-point capability of the same technology generation x86 CPUs. Thus, DSPs could potentially offer an energy efficient alternative to x86 CPUs. Here we report an assessment of the Texas Instruments TMS320C6678 DSP in regards to its energy efficiency for two common HPC benchmarks: STREAM (memory system benchmark) and HPL (CPU benchmark)
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets
2010-01-01
Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
Evaluating local indirect addressing in SIMD proc essors
NASA Technical Reports Server (NTRS)
Middleton, David; Tomboulian, Sherryl
1989-01-01
In the design of parallel computers, there exists a tradeoff between the number and power of individual processors. The single instruction stream, multiple data stream (SIMD) model of parallel computers lies at one extreme of the resulting spectrum. The available hardware resources are devoted to creating the largest possible number of processors, and consequently each individual processor must use the fewest possible resources. Disagreement exists as to whether SIMD processors should be able to generate addresses individually into their local data memory, or all processors should access the same address. The tradeoff is examined between the increased capability and the reduced number of processors that occurs in this single instruction stream, multiple, locally addressed, data (SIMLAD) model. Factors are assembled that affect this design choice, and the SIMLAD model is compared with the bare SIMD and the MIMD models.
VLSI design of a single chip reed-solomon encoder
DOE Office of Scientific and Technical Information (OSTI.GOV)
Truong, T.K.; Deutsch, L.J.; Reed, I.S.
A design for a single chip implementation of a Reed-Solomon encoder is presented. The architecture that leads to this single VLSI chip design makes use of a bit serial finite field multiplication algorithm.
The Level 0 Pixel Trigger system for the ALICE experiment
NASA Astrophysics Data System (ADS)
Aglieri Rinella, G.; Kluge, A.; Krivda, M.; ALICE Silicon Pixel Detector project
2007-01-01
The ALICE Silicon Pixel Detector contains 1200 readout chips. Fast-OR signals indicate the presence of at least one hit in the 8192 pixel matrix of each chip. The 1200 bits are transmitted every 100 ns on 120 data readout optical links using the G-Link protocol. The Pixel Trigger System extracts and processes them to deliver an input signal to the Level 0 trigger processor targeting a latency of 800 ns. The system is compact, modular and based on FPGA devices. The architecture allows the user to define and implement various trigger algorithms. The system uses advanced 12-channel parallel optical fiber modules operating at 1310 nm as optical receivers and 12 deserializer chips closely packed in small area receiver boards. Alternative solutions with multi-channel G-Link deserializers implemented directly in programmable hardware devices were investigated. The design of the system and the progress of the ALICE Pixel Trigger project are described in this paper.
A monolithic integrated photonic microwave filter
NASA Astrophysics Data System (ADS)
Fandiño, Javier S.; Muñoz, Pascual; Doménech, David; Capmany, José
2017-02-01
Meeting the increasing demand for capacity in wireless networks requires the harnessing of higher regions in the radiofrequency spectrum, reducing cell size, as well as more compact, agile and power-efficient base stations that are capable of smoothly interfacing the radio and fibre segments. Fully functional microwave photonic chips are promising candidates in attempts to meet these goals. In recent years, many integrated microwave photonic chips have been reported in different technologies. To the best of our knowledge, none has monolithically integrated all the main active and passive optoelectronic components. Here, we report the first demonstration of a tunable microwave photonics filter that is monolithically integrated into an indium phosphide chip. The reconfigurable radiofrequency photonic filter includes all the necessary elements (for example, lasers, modulators and photodetectors), and its response can be tuned by means of control electric currents. This is an important step in demonstrating the feasibility of integrated and programmable microwave photonic processors.
NASA Astrophysics Data System (ADS)
Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.
2011-12-01
Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
The VLSI design of a single chip Reed-Solomon encoder
NASA Technical Reports Server (NTRS)
Truong, T. K.; Deutsch, L. J.; Reed, I. S.
1982-01-01
A design for a single chip implementation of a Reed-Solomon encoder is presented. The architecture that leads to this single VLSI chip design makes use of a bit serial finite field multiplication algorithm.
AM06: the Associative Memory chip for the Fast TracKer in the upgraded ATLAS detector
NASA Astrophysics Data System (ADS)
Annovi, A.; Beretta, M. M.; Calderini, G.; Crescioli, F.; Frontini, L.; Liberali, V.; Shojaii, S. R.; Stabile, A.
2017-04-01
This paper describes the AM06 chip, which is a highly parallel processor for pattern recognition in the ATLAS high energy physics experiment. The AM06 contains memory banks that store data organized in 18 bit words; a group of 8 words is called "pattern". Each AM06 chip can store up to 131 072 patterns. The AM06 is a large chip, designed in 65 nm CMOS, and it combines full-custom memory arrays, standard logic cells and serializer/deserializer IP blocks at 2 Gbit/s for input/output communication. The overall silicon area is 168 mm2 and the chip contains about 421 million transistors. The AM06 receives the detector data for each event accepted by Level-1 trigger, up to 100 kHz, and it performs a track reconstruction based on hit information from channels of the ATLAS silicon detectors. Thanks to the design of a new associative memory cell and to the layout optimization, the AM06 consumption is only about 1 fJ/bit per comparison. The AM06 has been fabricated and successfully tested with a dedicated test system.
Flood replenishment: a new method of processor control.
Frank, E D; Gray, J E; Wilken, D A
1980-01-01
In mechanized radiographic film processors that process medium to low volumes of film, roll films, and those that process single-emulsion films from nuclear medicine scans, computed tomography, and ultrasound, it is difficult to maintain the developer solution at a stable processing level. We describe our experience using flood replenishment, which is a method in which developer replenisher containing starter solution is introduced in the processor at timed intervals, independent of the number of films being processed. By this process, a stable level of developer activity is maintained in a processor used to develop a medium to low volume of single-emulsion film.
An evaluation of the directed flow graph methodology
NASA Technical Reports Server (NTRS)
Snyder, W. E.; Rajala, S. A.
1984-01-01
The applicability of the Directed Graph Methodology (DGM) to the design and analysis of special purpose image and signal processing hardware was evaluated. A special purpose image processing system was designed and described using DGM. The design, suitable for very large scale integration (VLSI) implements a region labeling technique. Two computer chips were designed, both using metal-nitride-oxide-silicon (MNOS) technology, as well as a functional system utilizing those chips to perform real time region labeling. The system is described in terms of DGM primitives. As it is currently implemented, DGM is inappropriate for describing synchronous, tightly coupled, special purpose systems. The nature of the DGM formalism lends itself more readily to modeling networks of general purpose processors.
A single VLSI chip for computing syndromes in the (225, 223) Reed-Solomon decoder
NASA Technical Reports Server (NTRS)
Hsu, I. S.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.
1986-01-01
A description of a single VLSI chip for computing syndromes in the (255, 223) Reed-Solomon decoder is presented. The architecture that leads to this single VLSI chip design makes use of the dual basis multiplication algorithm. The same architecture can be applied to design VLSI chips to compute various kinds of number theoretic transforms.
Design and implementation of a high performance network security processor
NASA Astrophysics Data System (ADS)
Wang, Haixin; Bai, Guoqiang; Chen, Hongyi
2010-03-01
The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.
Broadband set-top box using MAP-CA processor
NASA Astrophysics Data System (ADS)
Bush, John E.; Lee, Woobin; Basoglu, Chris
2001-12-01
Advances in broadband access are expected to exert a profound impact in our everyday life. It will be the key to the digital convergence of communication, computer and consumer equipment. A common thread that facilitates this convergence comprises digital media and Internet. To address this market, Equator Technologies, Inc., is developing the Dolphin broadband set-top box reference platform using its MAP-CA Broadband Signal ProcessorT chip. The Dolphin reference platform is a universal media platform for display and presentation of digital contents on end-user entertainment systems. The objective of the Dolphin reference platform is to provide a complete set-top box system based on the MAP-CA processor. It includes all the necessary hardware and software components for the emerging broadcast and the broadband digital media market based on IP protocols. Such reference design requires a broadband Internet access and high-performance digital signal processing. By using the MAP-CA processor, the Dolphin reference platform is completely programmable, allowing various codecs to be implemented in software, such as MPEG-2, MPEG-4, H.263 and proprietary codecs. The software implementation also enables field upgrades to keep pace with evolving technology and industry demands.
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
This technical report contains the HOL listings of the specification of the design and major portions of the requirements for a commercially developed processor interface unit (or PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault-tolerant computer system. This system, the Fault-Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU specification as it currently exists. Section two of this report contains general-purpose HOL theories that support the PIU specification. These theories include definitions for the hardware components used in the PIU, our implementation of bit words, and our implementation of temporal logic. Section three contains the HOL listings for the PIU design specification. Aside from the PIU internal bus (I-Bus), this specification is complete. Section four contains the HOL listings for a major portion of the PIU requirements specification. Specifically, it contains most of the definition for the PIU behavior associated with memory accesses initiated by the local processor.
NASA Astrophysics Data System (ADS)
Männer, R.
1989-12-01
This paper describes a systolic array processor for a ring image Cherenkov counter which is capable of identifying pairs of electron circles with a known radius and a certain minimum distance within 15 μs. The processor is a very flexible and fast device. It consists of 128 x 128 processing elements (PEs), where one PE is assigned to each pixel of the image. All PEs run synchronously at 40 MHz. The identification of electron circles is done by correlating the detector image with the proper circle circumference. Circle centers are found by peak detection in the correlation result. A second correlation with a circle disc allows circles of closed electron pairs to be rejected. The trigger decision is generated if a pseudo adder detects at least two remaining circles. The device is controlled by a freely programmable sequencer. A VLSI chip containing 8 x 8 PEs is being developed using a VENUS design system and will be produced in 2μ CMOS technology.
NASA Technical Reports Server (NTRS)
Fura, David A.; Windley, Phillip J.; Cohen, Gerald C.
1993-01-01
This technical report contains the Higher-Order Logic (HOL) listings of the partial verification of the requirements and design for a commercially developed processor interface unit (PIU). The PIU is an interface chip performing memory interface, bus interface, and additional support services for a commercial microprocessor within a fault tolerant computer system. This system, the Fault Tolerant Embedded Processor (FTEP), is targeted towards applications in avionics and space requiring extremely high levels of mission reliability, extended maintenance-free operation, or both. This report contains the actual HOL listings of the PIU verification as it currently exists. Section two of this report contains general-purpose HOL theories and definitions that support the PIU verification. These include arithmetic theories dealing with inequalities and associativity, and a collection of tactics used in the PIU proofs. Section three contains the HOL listings for the completed PIU design verification. Section 4 contains the HOL listings for the partial requirements verification of the P-Port.
Multiple Embedded Processors for Fault-Tolerant Computing
NASA Technical Reports Server (NTRS)
Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy
2005-01-01
A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Solid State Audio/Speech Processor Analysis.
1980-03-01
techniques. The techniques were demonstrated to be worthwhile in an efficient realtime AWR system. Finally, microprocessor architectures were designed to...do not include custom chip development, detailed hardware design , construction or testing. ITTDCD is very encouraged by the results obtained in this...California, Berkley, was responsible for furnishing the simulation data of OD speech analysis techniques and for the design and development of the hardware OD
2012-11-01
few sensors/complex computations, and many sensors/simple computation. II. CHALLENGES WITH NANO-ENABLED NEUROMORPHIC CHIPS A wide variety of...scenarios. Neuromorphic processors, which are based on the highly parallelized computing architecture of the mammalian brain, show great promise in...in the brain. This fundamentally different approach, frequently referred to as neuromorphic computing, is thought to be better able to solve fuzzy
Deep learning for medical image segmentation - using the IBM TrueNorth neurosynaptic system
NASA Astrophysics Data System (ADS)
Moran, Steven; Gaonkar, Bilwaj; Whitehead, William; Wolk, Aidan; Macyszyn, Luke; Iyer, Subramanian S.
2018-03-01
Deep convolutional neural networks have found success in semantic image segmentation tasks in computer vision and medical imaging. These algorithms are executed on conventional von Neumann processor architectures or GPUs. This is suboptimal. Neuromorphic processors that replicate the structure of the brain are better-suited to train and execute deep learning models for image segmentation by relying on massively-parallel processing. However, given that they closely emulate the human brain, on-chip hardware and digital memory limitations also constrain them. Adapting deep learning models to execute image segmentation tasks on such chips, requires specialized training and validation. In this work, we demonstrate for the first-time, spinal image segmentation performed using a deep learning network implemented on neuromorphic hardware of the IBM TrueNorth Neurosynaptic System and validate the performance of our network by comparing it to human-generated segmentations of spinal vertebrae and disks. To achieve this on neuromorphic hardware, the training model constrains the coefficients of individual neurons to {-1,0,1} using the Energy Efficient Deep Neuromorphic (EEDN)1 networks training algorithm. Given the 1 million neurons and 256 million synapses, the scale and size of the neural network implemented by the IBM TrueNorth allows us to execute the requisite mapping between segmented images and non-uniform intensity MR images >20 times faster than on a GPU-accelerated network and using <0.1 W. This speed and efficiency implies that a trained neuromorphic chip can be deployed in intra-operative environments where real-time medical image segmentation is necessary.
Considerations for Future Climate Data Stewardship
NASA Astrophysics Data System (ADS)
Halem, M.; Nguyen, P. T.; Chapman, D. R.
2009-12-01
In this talk, we will describe the lessons learned based on processing and generating a decade of gridded AIRS and MODIS IR sounding data. We describe the challenges faced in accessing and sharing very large data sets, maintaining data provenance under evolving technologies, obtaining access to legacy calibration data and the permanent preservation of Earth science data records for on demand services. These lessons suggest a new approach to data stewardship will be required for the next decade of hyper spectral instruments combined with cloud resolving models. It will not be sufficient for stewards of future data centers to just provide the public with access to archived data but our experience indicates that data needs to reside close to computers with ultra large disc farms and tens of thousands of processors to deliver complex services on demand over very high speed networks much like the offerings of search engines today. Over the first decade of the 21st century, petabyte data records were acquired from the AIRS instrument on Aqua and the MODIS instrument on Aqua and Terra. NOAA data centers also maintain petabytes of operational IR sounders collected over the past four decades. The UMBC Multicore Computational Center (MC2) developed a Service Oriented Atmospheric Radiance gridding system (SOAR) to allow users to select IR sounding instruments from multiple archives and choose space-time- spectral periods of Level 1B data to download, grid, visualize and analyze on demand. Providing this service requires high data rate bandwidth access to the on line disks at Goddard. After 10 years, cost effective disk storage technology finally caught up with the MODIS data volume making it possible for Level 1B MODIS data to be available on line. However, 10Ge fiber optic networks to access large volumes of data are still not available from CSFC to serve the broader community. Data transfer rates are well below 10MB/s limiting their usefulness for climate studies. During this decade, processor performance hit a power wall leading computer vendors to design multicore processor chips. High performance computer systems obtained petaflop performance by clustering tens of thousands of multicore processor chips. Thus, power consumption and autonomic recovery from processor and disc failures have become major cost and technical considerations for future data archives. To address these new architecture requirements, a transparent parallel programming paradigm, the Hadoop MapReduce cloud computing system, became available as an open S/W system. In addition, the Hadoop File System and manages the distribution of data to these processors as well as backs up the processing in the event of any processor or disc failure. However, to employ this paradigm, the data needs to be stored on the computer system. We conclude this talk with a climate data preservation approach that addresses the scalability crisis to exabyte data requirements for the next decade based on projections of processor, disc data density and bandwidth doubling rates.
The Single Event Effect Characteristics of the 486-DX4 Microprocessor
NASA Technical Reports Server (NTRS)
Kouba, Coy; Choi, Gwan
1996-01-01
This research describes the development of an experimental radiation testing environment to investigate the single event effect (SEE) susceptibility of the 486-DX4 microprocessor. SEE effects are caused by radiation particles that disrupt the logic state of an operating semiconductor, and include single event upsets (SEU) and single event latchup (SEL). The relevance of this work can be applied directly to digital devices that are used in spaceflight computer systems. The 486-DX4 is a powerful commercial microprocessor that is currently under consideration for use in several spaceflight systems. As part of its selection process, it must be rigorously tested to determine its overall reliability in the space environment, including its radiation susceptibility. The goal of this research is to experimentally test and characterize the single event effects of the 486-DX4 microprocessor using a cyclotron facility as the fault-injection source. The test philosophy is to focus on the "operational susceptibility," by executing real software and monitoring for errors while the device is under irradiation. This research encompasses both experimental and analytical techniques, and yields a characterization of the 486-DX4's behavior for different operating modes. Additionally, the test methodology can accommodate a wide range of digital devices, such as microprocessors, microcontrollers, ASICS, and memory modules, for future testing. The goals were achieved by testing with three heavy-ion species to provide different linear energy transfer rates, and a total of six microprocessor parts were tested from two different vendors. A consistent set of error modes were identified that indicate the manner in which the errors were detected in the processor. The upset cross-section curves were calculated for each error mode, and the SEU threshold and saturation levels were identified for each processor. Results show a distinct difference in the upset rate for different configurations of the on-chip cache, as well as proving that one vendor is superior to the other in terms of latchup susceptibility. Results from this testing were also used to provide a mean-time-between-failure estimate of the 486-DX4 operating in the radiation environment for the International Space Station.
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G; Salapura, Valentina
2014-12-02
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Multi-beam and single-chip LIDAR with discrete beam steering by digital micromirror device
NASA Astrophysics Data System (ADS)
Rodriguez, Joshua; Smith, Braden; Hellman, Brandon; Gin, Adley; Espinoza, Alonzo; Takashima, Yuzuru
2018-02-01
A novel Digital Micromirror Device (DMD) based beam steering enables a single chip Light Detection and Ranging (LIDAR) system for discrete scanning points. We present increasing number of scanning point by using multiple laser diodes for Multi-beam and Single-chip DMD-based LIDAR.
Multi-processing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
The MIMD concept is applied, through multitasking, with relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. An existing single processor algorithm is mapped without the need for developing a new algorithm. The procedure of designing a code utilizing this approach is automated with the Unix stream editor. A Multiple Processor Multiple Grid (MPMG) code is developed as a demonstration of this approach. This code solves the three-dimensional, Reynolds-averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. This solver is applied to a generic, oblique-wing aircraft problem on a four-processor computer using one process for data management and nonparallel computations and three processes for pseudotime advance on three different grid systems.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Light-weight cyptography for resource constrained environments
NASA Astrophysics Data System (ADS)
Baier, Patrick; Szu, Harold
2006-04-01
We give a survey of "light-weight" encryption algorithms designed to maximise security within tight resource constraints (limited memory, power consumption, processor speed, chip area, etc.) The target applications of such algorithms are RFIDs, smart cards, mobile phones, etc., which may store, process and transmit sensitive data, but at the same time do not always support conventional strong algorithms. A survey of existing algorithms is given and new proposal is introduced.
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories
1989-02-01
frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
A CCD Monolithic LMS Adaptive Analog Signal Processor Integrated Circuit.
1980-03-01
adaptive filter with electrically- reprogrammable MOS analog conductance weights. I The analog and digital peripheral MOS on-chip circuits are provided with...electrically reprogrammable analog weights at tap positions along a CCD analog delay line in order to form a basic linear combiner for adaptive filtering...electrically reprogrammable analog conductance weights was introduced with the use of non-volatile MNOS memory 6-7 transistors biased in their triode
Gigaflop performance on a CRAY-2: Multitasking a computational fluid dynamics application
NASA Technical Reports Server (NTRS)
Tennille, Geoffrey M.; Overman, Andrea L.; Lambiotte, Jules J.; Streett, Craig L.
1991-01-01
The methodology is described for converting a large, long-running applications code that executed on a single processor of a CRAY-2 supercomputer to a version that executed efficiently on multiple processors. Although the conversion of every application is different, a discussion of the types of modification used to achieve gigaflop performance is included to assist others in the parallelization of applications for CRAY computers, especially those that were developed for other computers. An existing application, from the discipline of computational fluid dynamics, that had utilized over 2000 hrs of CPU time on CRAY-2 during the previous year was chosen as a test case to study the effectiveness of multitasking on a CRAY-2. The nature of dominant calculations within the application indicated that a sustained computational rate of 1 billion floating-point operations per second, or 1 gigaflop, might be achieved. The code was first analyzed and modified for optimal performance on a single processor in a batch environment. After optimal performance on a single CPU was achieved, the code was modified to use multiple processors in a dedicated environment. The results of these two efforts were merged into a single code that had a sustained computational rate of over 1 gigaflop on a CRAY-2. Timings and analysis of performance are given for both single- and multiple-processor runs.
NASA Astrophysics Data System (ADS)
Wade, Mark T.; Shainline, Jeffrey M.; Orcutt, Jason S.; Ram, Rajeev J.; Stojanovic, Vladimir; Popovic, Milos A.
2014-03-01
We present the spoked-ring microcavity, a nanophotonic building block enabling energy-efficient, active photonics in unmodified, advanced CMOS microelectronics processes. The cavity is realized in the IBM 45nm SOI CMOS process - the same process used to make many commercially available microprocessors including the IBM Power7 and Sony Playstation 3 processors. In advanced SOI CMOS processes, no partial etch steps and no vertical junctions are available, which limits the types of optical cavities that can be used for active nanophotonics. To enable efficient active devices with no process modifications, we designed a novel spoked-ring microcavity which is fully compatible with the constraints of the process. As a modulator, the device leverages the sub-100nm lithography resolution of the process to create radially extending p-n junctions, providing high optical fill factor depletion-mode modulation and thereby eliminating the need for a vertical junction. The device is made entirely in the transistor active layer, low-loss crystalline silicon, which eliminates the need for a partial etch commonly used to create ridge cavities. In this work, we present the full optical and electrical design of the cavity including rigorous mode solver and FDTD simulations to design the Qlimiting electrical contacts and the coupling/excitation. We address the layout of active photonics within the mask set of a standard advanced CMOS process and show that high-performance photonic devices can be seamlessly monolithically integrated alongside electronics on the same chip. The present designs enable monolithically integrated optoelectronic transceivers on a single advanced CMOS chip, without requiring any process changes, enabling the penetration of photonics into the microprocessor.
A Medical Language Processor for Two Indo-European Languages
Nhan, Ngo Thanh; Sager, Naomi; Lyman, Margaret; Tick, Leo J.; Borst, François; Su, Yun
1989-01-01
The syntax and semantics of clinical narrative across Indo-European languages are quite similar, making it possible to envison a single medical language processor that can be adapted for different European languages. The Linguistic String Project of New York University is continuing the development of its Medical Language Processor in this direction. The paper describes how the processor operates on English and French.
NASA Astrophysics Data System (ADS)
Brodersen, R. W.
1984-04-01
A scaled version of the RISC II chip has been fabricated and tested and these new chips have a cycle time that would outperform a VAX 11/780 by about a factor of two on compiled integer C programs. The architectural work on a RISC chip designed for a Smalltalk implementation has been completed. This chip, called SOAR (Smalltalk On a RISC), should run program s4-15 times faster than the Xerox 1100 (Dolphin), a TTL minicomputer, and about as fast as the Xerox 1132 (Dorado), a $100,000 ECL minicomputer. The 1983 VLSI tools tape has been converted for use under the latest UNIX release (4.2). The Magic (formerly called Caddy) layout system will be a unified set of highly automated tools that cover all aspects of the layout process, including stretching, compaction, tiling and routing. A multiple window package and design rule checker for this system have just been completed and compaction and stretching are partially implemented. New slope-based timing models for the Crystal timing analyzer are now fully implemented and in regular use. In an accuracy test using a dozen critical paths from the RISC II processor and cache chips it was found that Crystal's estimates were within 5-10% of SPICE's estimates, while being a factor of 10,000 times faster.
A generic FPGA-based detector readout and real-time image processing board
NASA Astrophysics Data System (ADS)
Sarpotdar, Mayuresh; Mathew, Joice; Safonova, Margarita; Murthy, Jayant
2016-07-01
For space-based astronomical observations, it is important to have a mechanism to capture the digital output from the standard detector for further on-board analysis and storage. We have developed a generic (application- wise) field-programmable gate array (FPGA) board to interface with an image sensor, a method to generate the clocks required to read the image data from the sensor, and a real-time image processor system (on-chip) which can be used for various image processing tasks. The FPGA board is applied as the image processor board in the Lunar Ultraviolet Cosmic Imager (LUCI) and a star sensor (StarSense) - instruments developed by our group. In this paper, we discuss the various design considerations for this board and its applications in the future balloon and possible space flights.
Lithium niobate guided-wave beam former for steering phased-array antennas.
Armenise, M N; Passaro, V M; Noviello, G
1994-09-10
We present the theoretical investigation, design, and simulation of a novel guided-wave optical processor for L-band-transmission beam forming in a linear array of phased active antennas. The proposed configuration includes two contradirectional surface acoustic-wave transducers, and it is based on a Y-cut, X-propagating Ti:LiNbO(3) planar waveguide supporting the lowest-order modes of both polarizations (TE(0) and TM(0)) at the free-space wavelength λ = 0.85 µm. A detailed comparison between the processor we propose and other optical and electronic architectures reported in the literature is carried out, exhibiting a number of significant advantages in terms of weight, total chip size, and power consumption, when the number of antenna elements is greater than 50.
Discrete sensitivity derivatives of the Navier-Stokes equations with a parallel Krylov solver
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Taylor, Arthur C., III
1994-01-01
This paper solves an 'incremental' form of the sensitivity equations derived by differentiating the discretized thin-layer Navier Stokes equations with respect to certain design variables of interest. The equations are solved with a parallel, preconditioned Generalized Minimal RESidual (GMRES) solver on a distributed-memory architecture. The 'serial' sensitivity analysis code is parallelized by using the Single Program Multiple Data (SPMD) programming model, domain decomposition techniques, and message-passing tools. Sensitivity derivatives are computed for low and high Reynolds number flows over a NACA 1406 airfoil on a 32-processor Intel Hypercube, and found to be identical to those computed on a single-processor Cray Y-MP. It is estimated that the parallel sensitivity analysis code has to be run on 40-50 processors of the Intel Hypercube in order to match the single-processor processing time of a Cray Y-MP.
Sensing and perception research for space telerobotics at JPL
NASA Technical Reports Server (NTRS)
Gennery, Donald B.; Litwin, Todd; Wilcox, Brian; Bon, Bruce
1987-01-01
PIFLEX is a pipelined-image processor that can perform elaborate computations whose exact nature is not fixed in the hardware, and that can handle multiple images. A wire-wrapped prototype PIFEX module has been produced and debugged, using a version of the convolver composed of three custom VLSI chips (plus the line buffers). A printed circuit layout is being designed for use with a single-chip convolver, leading to production of a PIFEX with about 120 modules. A high-level language for programming PIFEX has been designed, and a compiler will be written for it. The camera calibration software has been completed and tested. Two more terms in the camera model, for lens distortion, probably will be added later. The acquisition and tracking system has been designed and most of it has been coded in Pascal for the MicroVAX-II. The feature tracker, motion stereo module and stereo matcher have executed successfully. The model matcher is still under development, and coding has begun on the tracking initializer. The object tracker was running on a different computer from the VAX, and preliminary runs on real images have been performed there. Once all modules are working, optimization and integration will begin. Finally, when a sufficiently large PIFEX is available, appropriate parts of acquisition and tracking, including much of the feature tracker, will be programmed into PIFEX, thus increasing the speed and robustness of the system.
High-Performance, Radiation-Hardened Electronics for Space Environments
NASA Technical Reports Server (NTRS)
Keys, Andrew S.; Watson, Michael D.; Frazier, Donald O.; Adams, James H.; Johnson, Michael A.; Kolawa, Elizabeth A.
2007-01-01
The Radiation Hardened Electronics for Space Environments (RHESE) project endeavors to advance the current state-of-the-art in high-performance, radiation-hardened electronics and processors, ensuring successful performance of space systems required to operate within extreme radiation and temperature environments. Because RHESE is a project within the Exploration Technology Development Program (ETDP), RHESE's primary customers will be the human and robotic missions being developed by NASA's Exploration Systems Mission Directorate (ESMD) in partial fulfillment of the Vision for Space Exploration. Benefits are also anticipated for NASA's science missions to planetary and deep-space destinations. As a technology development effort, RHESE provides a broad-scoped, full spectrum of approaches to environmentally harden space electronics, including new materials, advanced design processes, reconfigurable hardware techniques, and software modeling of the radiation environment. The RHESE sub-project tasks are: SelfReconfigurable Electronics for Extreme Environments, Radiation Effects Predictive Modeling, Radiation Hardened Memory, Single Event Effects (SEE) Immune Reconfigurable Field Programmable Gate Array (FPGA) (SIRF), Radiation Hardening by Software, Radiation Hardened High Performance Processors (HPP), Reconfigurable Computing, Low Temperature Tolerant MEMS by Design, and Silicon-Germanium (SiGe) Integrated Electronics for Extreme Environments. These nine sub-project tasks are managed by technical leads as located across five different NASA field centers, including Ames Research Center, Goddard Space Flight Center, the Jet Propulsion Laboratory, Langley Research Center, and Marshall Space Flight Center. The overall RHESE integrated project management responsibility resides with NASA's Marshall Space Flight Center (MSFC). Initial technology development emphasis within RHESE focuses on the hardening of Field Programmable Gate Arrays (FPGA)s and Field Programmable Analog Arrays (FPAA)s for use in reconfigurable architectures. As these component/chip level technologies mature, the RHESE project emphasis shifts to focus on efforts encompassing total processor hardening techniques and board-level electronic reconfiguration techniques featuring spare and interface modularity. This phased approach to distributing emphasis between technology developments provides hardened FPGA/FPAAs for early mission infusion, then migrates to hardened, board-level, high speed processors with associated memory elements and high density storage for the longer duration missions encountered for Lunar Outpost and Mars Exploration occurring later in the Constellation schedule.
Single-Chip Microcomputer Control Of The PWM Inverter
NASA Astrophysics Data System (ADS)
Morimoto, Masayuki; Sato, Shinji; Sumito, Kiyotaka; Oshitani, Katsumi
1987-10-01
A single-chip microcomputer-based con-troller for a pulsewidth modulated 1.7 KVA inverter of an airconditioner is presented. The PWM pattern generation and the system control of the airconditioner are achieved by software of the 8-bit single-chip micro-computer. The single-chip microcomputer has the disadvantages of low processing speed and small memory capacity which can be overcome by the magnetic flux control method. The PWM pattern is generated every 90 psec. The memory capacity of the PWM look-up table is less than 2 kbytes. The simple and reliable control is realized by the software-based implementation.
NASA Astrophysics Data System (ADS)
Bruno, A.; Michalak, D. J.; Poletto, S.; Clarke, J. S.; Dicarlo, L.
Large-scale quantum computation hinges on the ability to preserve and process quantum information with higher fidelity by increasing redundancy in a quantum error correction code. We present the realization of a scalable footprint for superconducting surface code based on planar circuit QED. We developed a tileable unit cell for surface code with all I/O routed vertically by means of superconducting through-silicon vias (TSVs). We address some of the challenges encountered during the fabrication and assembly of these chips, such as the quality of etch of the TSV, the uniformity of the ALD TiN coating conformal to the TSV, and the reliability of superconducting indium contact between the chips and PCB. We compare measured performance to a detailed list of specifications required for the realization of quantum fault tolerance. Our demonstration using centimeter-scale chips can accommodate the 50 qubits needed to target the experimental demonstration of small-distance logical qubits. Research funded by Intel Corporation and IARPA.
Scalable, efficient ASICS for the square kilometre array: From A/D conversion to central correlation
NASA Astrophysics Data System (ADS)
Schmatz, M. L.; Jongerius, R.; Dittmann, G.; Anghel, A.; Engbersen, T.; van Lunteren, J.; Buchmann, P.
2014-05-01
The Square Kilometre Array (SKA) is a future radio telescope, currently being designed by the worldwide radio-astronomy community. During the first of two construction phases, more than 250,000 antennas will be deployed, clustered in aperture-array stations. The antennas will generate 2.5 Pb/s of data, which needs to be processed in real time. For the processing stages from A/D conversion to central correlation, we propose an ASIC solution using only three chip architectures. The architecture is scalable - additional chips support additional antennas or beams - and versatile - it can relocate its receiver band within a range of a few MHz up to 4GHz. This flexibility makes it applicable to both SKA phases 1 and 2. The proposed chips implement an antenna and station processor for 289 antennas with a power consumption on the order of 600W and a correlator, including corner turn, for 911 stations on the order of 90 kW.
Safe and Efficient Support for Embeded Multi-Processors in ADA
NASA Astrophysics Data System (ADS)
Ruiz, Jose F.
2010-08-01
New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
PREMAQ: A NEW PRE-PROCESSOR TO CMAQ FOR AIR-QUALITY FORECASTING
A new pre-processor to CMAQ (PREMAQ) has been developed as part of the national air-quality forecasting system. PREMAQ combines the functionality of MCIP and parts of SMOKE in a single real-time processor. PREMAQ was specifically designed to link NCEP's Eta model with CMAQ, and...
Digital signal processor and processing method for GPS receivers
NASA Technical Reports Server (NTRS)
Thomas, Jr., Jess B. (Inventor)
1989-01-01
A digital signal processor and processing method therefor for use in receivers of the NAVSTAR/GLOBAL POSITIONING SYSTEM (GPS) employs a digital carrier down-converter, digital code correlator and digital tracking processor. The digital carrier down-converter and code correlator consists of an all-digital, minimum bit implementation that utilizes digital chip and phase advancers, providing exceptional control and accuracy in feedback phase and in feedback delay. Roundoff and commensurability errors can be reduced to extremely small values (e.g., less than 100 nanochips and 100 nanocycles roundoff errors and 0.1 millichip and 1 millicycle commensurability errors). The digital tracking processor bases the fast feedback for phase and for group delay in the C/A, P.sub.1, and P.sub.2 channels on the L.sub.1 C/A carrier phase thereby maintaining lock at lower signal-to-noise ratios, reducing errors in feedback delays, reducing the frequency of cycle slips and in some cases obviating the need for quadrature processing in the P channels. Simple and reliable methods are employed for data bit synchronization, data bit removal and cycle counting. Improved precision in averaged output delay values is provided by carrier-aided data-compression techniques. The signal processor employs purely digital operations in the sense that exactly the same carrier phase and group delay measurements are obtained, to the last decimal place, every time the same sampled data (i.e., exactly the same bits) are processed.
Low-power chip-level optical interconnects based on bulk-silicon single-chip photonic transceivers
NASA Astrophysics Data System (ADS)
Kim, Gyungock; Park, Hyundai; Joo, Jiho; Jang, Ki-Seok; Kwack, Myung-Joon; Kim, Sanghoon; Kim, In Gyoo; Kim, Sun Ae; Oh, Jin Hyuk; Park, Jaegyu; Kim, Sanggi
2016-03-01
We present new scheme for chip-level photonic I/Os, based on monolithically integrated vertical photonic devices on bulk silicon, which increases the integration level of PICs to a complete photonic transceiver (TRx) including chip-level light source. A prototype of the single-chip photonic TRx based on a bulk silicon substrate demonstrated 20 Gb/s low power chip-level optical interconnects between fabricated chips, proving that this scheme can offer compact low-cost chip-level I/O solutions and have a significant impact on practical electronic-photonic integration in high performance computers (HPC), cpu-memory interface, 3D-IC, and LAN/SAN/data-center and network applications.
The science of computing - Parallel computation
NASA Technical Reports Server (NTRS)
Denning, P. J.
1985-01-01
Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
Reliability study of high-brightness multiple single emitter diode lasers
NASA Astrophysics Data System (ADS)
Zhu, Jing; Yang, Thomas; Zhang, Cuipeng; Lang, Chao; Jiang, Xiaochen; Liu, Rui; Gao, Yanyan; Guo, Weirong; Jiang, Yuhua; Liu, Yang; Zhang, Luyan; Chen, Louisa
2015-03-01
In this study the chip bonding processes for various chips from various chip suppliers around the world have been optimized to achieve reliable chip on sub-mount for high performance. These chip on sub-mounts, for examples, includes three types of bonding, 8xx nm-1.2W/10.0W Indium bonded lasers, 9xx nm 10W-20W AuSn bonded lasers and 1470 nm 6W Indium bonded lasers will be reported below. The MTTF@25 of 9xx nm chip on sub-mount (COS) is calculated to be more than 203,896 hours. These chips from various chip suppliers are packaged into many multiple single emitter laser modules, using similar packaging techniques from 2 emitters per module to up to 7 emitters per module. A reliability study including aging test is performed on those multiple single emitter laser modules. With research team's 12 years' experienced packaging design and techniques, precise optical and fiber alignment processes and superior chip bonding capability, we have achieved a total MTTF exceeding 177,710 hours of life time with 60% confidence level for those multiple single emitter laser modules. Furthermore, a separated reliability study on wavelength stabilized laser modules have shown this wavelength stabilized module packaging process is reliable as well.
Development for SSV on a parallel processing system (PARAGON)
NASA Astrophysics Data System (ADS)
Gothard, Benny M.; Allmen, Mark; Carroll, Michael J.; Rich, Dan
1995-12-01
A goal of the surrogate semi-autonomous vehicle (SSV) program is to have multiple vehicles navigate autonomously and cooperatively with other vehicles. This paper describes the process and tools used in porting UGV/SSV (unmanned ground vehicle) autonomous mobility and target recognition algorithms from a SISD (single instruction single data) processor architecture (i.e., a Sun SPARC workstation running C/UNIX) to a MIMD (multiple instruction multiple data) parallel processor architecture (i.e., PARAGON-a parallel set of i860 processors running C/UNIX). It discusses the gains in performance and the pitfalls of such a venture. It also examines the merits of this processor architecture (based on this conceptual prototyping effort) and programming paradigm to meet the final SSV demonstration requirements.
NASA Technical Reports Server (NTRS)
Boriakoff, Valentin; Chen, Wei
1990-01-01
The NASA-Cornell Univ.-Worcester Polytechnic Institute Fast Fourier Transform (FFT) chip based on the architecture of the systolic FFT computation as presented by Boriakoff is implemented into an operating device design. The kernel of the system, a systolic inner product floating point processor, was designed to be assembled into a systolic network that would take incoming data streams in pipeline fashion and provide an FFT output at the same rate, word by word. It was thoroughly simulated for proper operation, and it has passed a comprehensive set of tests showing no operational errors. The black box specifications of the chip, which conform to the initial requirements of the design as specified by NASA, are given. The five subcells are described and their high level function description, logic diagrams, and simulation results are presented. Some modification of the Read Only Memory (ROM) design were made, since some errors were found in it. Because a four stage pipeline structure was used, simulating such a structure is more difficult than an ordinary structure. Simulation methods are discussed. Chip signal protocols and chip pinout are explained.
Video Bandwidth Compression System.
1980-08-01
scaling function, located between the inverse DPCM and inverse transform , on the decoder matrix multiplier chips. 1"V1 T.. ---- i.13 SECURITY...Bit Unpacker and Inverse DPCM Slave Sync Board 15 e. Inverse DPCM Loop Boards 15 f. Inverse Transform Board 16 g. Composite Video Output Board 16...36 a. Display Refresh Memory 36 (1) Memory Section 37 (2) Timing and Control 39 b. Bit Unpacker and Inverse DPCM 40 c. Inverse Transform Processor 43
2008-04-01
Space GmbH as follows: B. TECHNICAL PRPOPOSA/DESCRIPTION OF WORK Cell: A Revolutionary High Performance Computing Platform On 29 June 2005 [1...IBM has announced that is has partnered with Mercury Computer Systems, a maker of specialized computers . The Cell chip provides massive floating-point...the computing industry away from the traditional processor technology dominated by Intel. While in the past, the development of computing power has
Communications systems and methods for subsea processors
Gutierrez, Jose; Pereira, Luis
2016-04-26
A subsea processor may be located near the seabed of a drilling site and used to coordinate operations of underwater drilling components. The subsea processor may be enclosed in a single interchangeable unit that fits a receptor on an underwater drilling component, such as a blow-out preventer (BOP). The subsea processor may issue commands to control the BOP and receive measurements from sensors located throughout the BOP. A shared communications bus may interconnect the subsea processor and underwater components and the subsea processor and a surface or onshore network. The shared communications bus may be operated according to a time division multiple access (TDMA) scheme.
Kim, Gyungock; Park, Hyundai; Joo, Jiho; Jang, Ki-Seok; Kwack, Myung-Joon; Kim, Sanghoon; Kim, In Gyoo; Oh, Jin Hyuk; Kim, Sun Ae; Park, Jaegyu; Kim, Sanggi
2015-06-10
When silicon photonic integrated circuits (PICs), defined for transmitting and receiving optical data, are successfully monolithic-integrated into major silicon electronic chips as chip-level optical I/Os (inputs/outputs), it will bring innovative changes in data computing and communications. Here, we propose new photonic integration scheme, a single-chip optical transceiver based on a monolithic-integrated vertical photonic I/O device set including light source on bulk-silicon. This scheme can solve the major issues which impede practical implementation of silicon-based chip-level optical interconnects. We demonstrated a prototype of a single-chip photonic transceiver with monolithic-integrated vertical-illumination type Ge-on-Si photodetectors and VCSELs-on-Si on the same bulk-silicon substrate operating up to 50 Gb/s and 20 Gb/s, respectively. The prototype realized 20 Gb/s low-power chip-level optical interconnects for λ ~ 850 nm between fabricated chips. This approach can have a significant impact on practical electronic-photonic integration in high performance computers (HPC), cpu-memory interface, hybrid memory cube, and LAN, SAN, data center and network applications.
Development of a Portable Blood Sugar Apparatus and GOD Enzyme Strip.
Zhen-Cheng, Chen; Yu-Qian, Zhao; Jing-Tian, Tang; Ling-Yun, Li
2005-01-01
A pocket blood sugar apparatus tested by enzyme electrode, which was designed using carbon and silver plasma as conducting materials. Both the work and reference electrodes are applied to the parts of enzyme electrode. The glucose oxidase is taken as the medium of blood sugar measuring. And the range of measuring glucose is about 50mg/dL - 500mgl/dL. It has better linear for the results and fit coefficient arrives at 0.985. Its sensitivity of measurement is higher than current glucose biosensor obviously. A single chip microcomputer, AD mu C812, is used for central control processor of the instrument parts. It measures the output of microampere level currency, which is conduced by blood sugar reacting with the glucose oxidase on the enzyme electrode. And at the same time, the microampere level currency is amplified, processed. Then the results are displayed on LCD. The apparatus are better for measuring blood sugar, and the results are consistent with what the large biochemical instruments get.
Study on fault diagnosis and load feedback control system of combine harvester
NASA Astrophysics Data System (ADS)
Li, Ying; Wang, Kun
2017-01-01
In order to timely gain working status parameters of operating parts in combine harvester and improve its operating efficiency, fault diagnosis and load feedback control system is designed. In the system, rotation speed sensors were used to gather these signals of forward speed and rotation speeds of intermediate shaft, conveying trough, tangential and longitudinal flow threshing rotors, grain conveying auger. Using C8051 single chip microcomputer (SCM) as processor for main control unit, faults diagnosis and forward speed control were carried through by rotation speed ratio analysis of each channel rotation speed and intermediate shaft rotation speed by use of multi-sensor fused fuzzy control algorithm, and these processing results would be sent to touch screen and display work status of combine harvester. Field trials manifest that fault monitoring and load feedback control system has good man-machine interaction and the fault diagnosis method based on rotation speed ratios has low false alarm rate, and the system can realize automation control of forward speed for combine harvester.
A TMS320-based modem for the aeronautical-satellite core data service
NASA Astrophysics Data System (ADS)
Moher, Michael L.; Lodge, John H.
The International Civil Aviation Organization (ICAO) Future Air Navigation Systems (FANS) committee, the Airlines Electronics Engineering Committee (AEEC), and Inmarsat have been developing standards for an aeronautical satellite communications service. These standards encompass a satellite communications system architecture to provide comprehensive aeronautical communications services. Incorporated into the architecture is a core service capability, providing only low rate data communications, which all service providers and all aircraft earth terminals are required to support. In this paper an implementation of the physical layer of this standard for the low data rate core service is described. This is a completely digital modem (up to a low intermediate frequency). The implementation uses a single TMS320C25 chip for the transmit baseband functions of scrambling, encoding, interleaving, block formatting and modulation. The receiver baseband unit uses a dual processor configuration to implement the functions of demodulation, synchronization, de-interleaving, decoding and de-scrambling. The hardware requirements, the software structure and the algorithms of this implementation are described.
Energy consumption estimation of an OMAP-based Android operating system
NASA Astrophysics Data System (ADS)
González, Gabriel; Juárez, Eduardo; Castro, Juan José; Sanz, César
2011-05-01
System-level energy optimization of battery-powered multimedia embedded systems has recently become a design goal. The poor operational time of multimedia terminals makes computationally demanding applications impractical in real scenarios. For instance, the so-called smart-phones are currently unable to remain in operation longer than several hours. The OMAP3530 processor basically consists of two processing cores, a General Purpose Processor (GPP) and a Digital Signal Processor (DSP). The former, an ARM Cortex-A8 processor, is aimed to run a generic Operating System (OS) while the latter, a DSP core based on the C64x+, has architecture optimized for video processing. The BeagleBoard, a commercial prototyping board based on the OMAP processor, has been used to test the Android Operating System and measure its performance. The board has 128 MB of SDRAM external memory, 256 MB of Flash external memory and several interfaces. Note that the clock frequency of the ARM and DSP OMAP cores is 600 MHz and 430 MHz, respectively. This paper describes the energy consumption estimation of the processes and multimedia applications of an Android v1.6 (Donut) OS on the OMAP3530-Based BeagleBoard. In addition, tools to communicate the two processing cores have been employed. A test-bench to profile the OS resource usage has been developed. As far as the energy estimates concern, the OMAP processor energy consumption model provided by the manufacturer has been used. The model is basically divided in two energy components. The former, the baseline core energy, describes the energy consumption that is independent of any chip activity. The latter, the module active energy, describes the energy consumed by the active modules depending on resource usage.
Pierce, Paul E.
1986-01-01
A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
Pierce, P.E.
A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.
NASA Astrophysics Data System (ADS)
Rerucha, Simon; Sarbort, Martin; Hola, Miroslava; Cizek, Martin; Hucl, Vaclav; Cip, Ondrej; Lazar, Josef
2016-12-01
The homodyne detection with only a single detector represents a promising approach in the interferometric application which enables a significant reduction of the optical system complexity while preserving the fundamental resolution and dynamic range of the single frequency laser interferometers. We present the design, implementation and analysis of algorithmic methods for computational processing of the single-detector interference signal based on parallel pipelined processing suitable for real time implementation on a programmable hardware platform (e.g. the FPGA - Field Programmable Gate Arrays or the SoC - System on Chip). The algorithmic methods incorporate (a) the single detector signal (sine) scaling, filtering, demodulations and mixing necessary for the second (cosine) quadrature signal reconstruction followed by a conic section projection in Cartesian plane as well as (a) the phase unwrapping together with the goniometric and linear transformations needed for the scale linearization and periodic error correction. The digital computing scheme was designed for bandwidths up to tens of megahertz which would allow to measure the displacements at the velocities around half metre per second. The algorithmic methods were tested in real-time operation with a PC-based reference implementation that employed the advantage pipelined processing by balancing the computational load among multiple processor cores. The results indicate that the algorithmic methods are suitable for a wide range of applications [3] and that they are bringing the fringe counting interferometry closer to the industrial applications due to their optical setup simplicity and robustness, computational stability, scalability and also a cost-effectiveness.
NASA Astrophysics Data System (ADS)
Weigand, R.
Two new processor devices have been developed for the use on board of spacecrafts. An 8-bit 8032-microcontroller targets typical controlling applications in instruments and sub-systems, or could be used as a main processor on small satellites, whereas the LEON 32-bit SPARC processor can be used for high performance controlling and data processing tasks. The ADV80S32 is fully compliant to the Intel 80x1 architecture and instruction set, extended by additional peripherals, 512 bytes on-chip RAM and a bootstrap PROM, which allows downloading the application software using the CCSDS PacketWire pro- tocol. The memory controller provides a de-multiplexed address/data bus, and allows to access up to 16 MB data and 8 MB program RAM. The peripherals have been de- signed for the specific needs of a spacecraft, such as serial interfaces compatible to RS232, PacketWire and TTC-B-01, counters/timers for extended duration and a CRC calculation unit accelerating the CCSDS TM/TC protocol. The 0.5 um Atmel manu- facturing technology (MG2RT) provides latch-up and total dose immunity; SEU fault immunity is implemented by using SEU hardened Flip-Flops and EDAC protection of internal and external memories. The maximum clock frequency of 20 MHz allows a processing power of 3 MIPS. Engineering samples are available. For SW develop- ment, various SW packages for the 8051 architecture are on the market. The LEON processor implements a 32-bit SPARC V8 architecture, including all the multiply and divide instructions, complemented by a floating-point unit (FPU). It includes several standard peripherals, such as timers/watchdog, interrupt controller, UARTs, parallel I/Os and a memory controller, allowing to use 8, 16 and 32 bit PROM, SRAM or memory mapped I/O. With on-chip separate instruction and data caches, almost one instruction per clock cycle can be reached in some applications. A 33-MHz 32-bit PCI master/target interface and a PCI arbiter allow operating the device in a plug-in card (for SW development on PC etc.), or to consider using it as a PCI master controller in an on-board system. Advanced SEU fault tolerance is in- troduced by design, using triple modular redundancy (TMR) flip-flops for all registers and EDAC protection for all memories. The device will be manufactured in a radia- tion hard Atmel 0.25 um technology, targeting 100 MHz processor clock frequency. The non fault-tolerant LEON processor VHDL model is available as free source code, and the SPARC architecture is a well-known industry standard. Therefore, know-how, software tools and operating systems are widely available.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Szadkowski, Zbigniew
We present the new approach to a filtering of radio frequency interferences (RFI) in the Auger Engineering Radio Array (AERA) which study the electromagnetic part of the Extensive Air Showers. The radio stations can observe radio signals caused by coherent emissions due to geomagnetic radiation and charge excess processes. AERA observes frequency band from 30 to 80 MHz. This range is highly contaminated by human-made RFI. In order to improve the signal to noise ratio RFI filters are used in AERA to suppress this contamination. The first kind of filter used by AERA was the Median one, based on themore » Fast Fourier Transform (FFT) technique. The second one, which is currently in use, is the infinite impulse response (IIR) notch filter. The proposed new filter is a finite impulse response (FIR) filter based on a linear prediction (LP). A periodic contamination hidden in a registered signal (digitized in the ADC) can be extracted and next subtracted to make signal cleaner. The FIR filter requires a calculation of n=32, 64 or even 128 coefficients (dependent on a required speed or accuracy) by solving of n linear equations with coefficients built from the covariance Toeplitz matrix. This matrix can be solved by the Levinson recursion, which is much faster than the Gauss procedure. The filter has been already tested in the real AERA radio stations on Argentinean pampas with a very successful results. The linear equations were solved either in the virtual soft-core NIOSR processor (implemented in the FPGA chip as a net of logic elements) or in the external Voipac PXA270M ARM processor. The NIOS processor is relatively slow (50 MHz internal clock), calculations performed in an external processor consume a significant amount of time for data exchange between the FPGA and the processor. Test showed a very good efficiency of the RFI suppression for stationary (long-term) contaminations. However, we observed a short-time contaminations, which could not be suppressed either by the IIR-notch filter or by the FIR filter based on the linear predictions. For the LP FIR filter the refreshment time of the filter coefficients was to long and filter did not keep up with the changes of a contamination structure, mainly due to a long calculation time in a slow processors. We propose to use the Cyclone V SE chip with embedded micro-controller operating with 925 MHz internal clock to significantly reduce a refreshment time of the FIR coefficients. The lab results are promising. (authors)« less
System on a Chip (SoC) Overview
NASA Technical Reports Server (NTRS)
LaBel, Kenneth A.
2010-01-01
System-on-a-chip or system on chip (SoC or SOC) refers to integrating all components of a computer or other electronic system into a single integrated circuit (chip). It may contain digital, analog, mixed-signal, and often radio-frequency functions all on a single chip substrate. Complexity drives it all: Radiation tolerance and testability are challenges for fault isolation, propagation, and validation. Bigger single silicon die than flown before and technology is scaling below 90nm (new qual methods). Packages have changed and are bigger and more difficult to inspect, test, and understand. Add in embedded passives. Material interfaces are more complex (underfills, processing). New rules for board layouts. Mechanical and thermal designs, etc.
NASA Astrophysics Data System (ADS)
Parks, Joshua W.
Optofluidics, born of the desire to create a system containing microfluidic environments with integrated optical elements, has seen dramatic increases in popularity over the last 10 years. In particular, the application of this technology towards chip based molecular sensors has undergone significant development. The most sensitive of these biosensors interface liquid- and solid-core antiresonant reflecting optical waveguides (ARROWs). These sensor chips are created using conventional silicon microfabrication. As such, ARROW technology has previously been unable to utilize state-of-the-art microfluidic developments because the technology used--soft polydimethyl siloxane (PDMS) micromolded chips--is unamenable to the silicon microfabrication workflows implemented in the creation of ARROW detection chips. The original goal of this thesis was to employ hybrid integration, or the connection of independently designed and fabricated optofluidic and microfluidic chips, to create enhanced biosensors with the capability of processing and detecting biological samples on a single hybrid system. After successful demonstration of this paradigm, this work expanded into a new direction--direct integration of sensing and detection technologies on a new platform with dynamic, multi-dimensional photonic re-configurability. This thesis reports a number of firsts, including: • 1,000 fold optical transmission enhancement of ARROW optofluidic detection chips through thermal annealing, • Detection of single nucleic acids on a silicon-based ARROW chip, • Hybrid optofluidic integration of ARROW detection chips and passive PDMS microfluidic chips, • Hybrid optofluidic integration of ARROW detection chips and actively controllable PDMS microfluidic chips with integrated microvalves, • On-chip concentration and detection of clinical Ebola nucleic acids, • Multimode interference (MMI) waveguide based wavelength division multiplexing for detection of single influenza virions, • All PDMS platform created from monolithically integrated solid- and liquid-core waveguides with single particle detection efficiency and directly integrated microvalves, featuring: ∘ Tunable/tailorable PDMS MMI waveguides, ∘ Lightvalves (optical switch/fluidic microvalve) with the ability to dynamically control light and fluid flow simultaneously, ∘ Lightvalve trap architecture with the ability to physically trap, detect, and analyze single biomolecules.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1994-01-01
In a computer having a large number of single-instruction multiple data (SIMD) processors, each of the SIMD processors has two sets of three individual processor elements controlled by a master control unit and interconnected among a plurality of register file units where data is stored. The register files input and output data in synchronism with a minor cycle clock under control of two slave control units controlling the register file units connected to respective ones of the two sets of processor elements. Depending upon which ones of the register file units are enabled to store or transmit data during a particular minor clock cycle, the processor elements within an SIMD processor are connected in rings or in pipeline arrays, and may exchange data with the internal bus or with neighboring SIMD processors through interface units controlled by respective ones of the two slave control units.
(abstract) A High Throughput 3-D Inner Product Processor
NASA Technical Reports Server (NTRS)
Daud, Tuan
1996-01-01
A particularily challenging image processing application is the real time scene acquisition and object discrimination. It requires spatio-temporal recognition of point and resolved objects at high speeds with parallel processing algorithms. Neural network paradigms provide fine grain parallism and, when implemented in hardware, offer orders of magnitude speed up. However, neural networks implemented on a VLSI chip are planer architectures capable of efficient processing of linear vector signals rather than 2-D images. Therefore, for processing of images, a 3-D stack of neural-net ICs receiving planar inputs and consuming minimal power are required. Details of the circuits with chip architectures will be described with need to develop ultralow-power electronics. Further, use of the architecture in a system for high-speed processing will be illustrated.
Design and implementation of H.264 based embedded video coding technology
NASA Astrophysics Data System (ADS)
Mao, Jian; Liu, Jinming; Zhang, Jiemin
2016-03-01
In this paper, an embedded system for remote online video monitoring was designed and developed to capture and record the real-time circumstances in elevator. For the purpose of improving the efficiency of video acquisition and processing, the system selected Samsung S5PV210 chip as the core processor which Integrated graphics processing unit. And the video was encoded with H.264 format for storage and transmission efficiently. Based on S5PV210 chip, the hardware video coding technology was researched, which was more efficient than software coding. After running test, it had been proved that the hardware video coding technology could obviously reduce the cost of system and obtain the more smooth video display. It can be widely applied for the security supervision [1].
WATERLOPP V2/64: A highly parallel machine for numerical computation
NASA Astrophysics Data System (ADS)
Ostlund, Neil S.
1985-07-01
Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.
Multiprocessing on supercomputers for computational aerodynamics
NASA Technical Reports Server (NTRS)
Yarrow, Maurice; Mehta, Unmeel B.
1990-01-01
Very little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPs or more) in computational aerodynamics to significantly improve turnaround time. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, the improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) through multi-tasking is applied via a strategy which requires relatively minor modifications to an existing code for a single processor. Essentially, this approach maps the available memory to multiple processors, exploiting the C-FORTRAN-Unix interface. The existing single processor code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor. As a demonstration of this approach, a Multiple Processor Multiple Grid (MPMG) code is developed. It is capable of using nine processors, and can be easily extended to a larger number of processors. This code solves the three-dimensional, Reynolds averaged, thin-layer and slender-layer Navier-Stokes equations with an implicit, approximately factored and diagonalized method. The solver is applied to generic oblique-wing aircraft problem on a four processor Cray-2 computer. A tricubic interpolation scheme is developed to increase the accuracy of coupling of overlapped grids. For the oblique-wing aircraft problem, a speedup of two in elapsed (turnaround) time is observed in a saturated time-sharing environment.
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
NASA Astrophysics Data System (ADS)
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad
2015-05-01
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
Developing software to use parallel processing effectively. Final report, June-December 1987
DOE Office of Scientific and Technical Information (OSTI.GOV)
Center, J.
1988-10-01
This report describes the difficulties involved in writing efficient parallel programs and describes the hardware and software support currently available for generating software that utilizes processing effectively. Historically, the processing rate of single-processor computers has increased by one order of magnitude every five years. However, this pace is slowing since electronic circuitry is coming up against physical barriers. Unfortunately, the complexity of engineering and research problems continues to require ever more processing power (far in excess of the maximum estimated 3 Gflops achievable by single-processor computers). For this reason, parallel-processing architectures are receiving considerable interest, since they offer high performancemore » more cheaply than a single-processor supercomputer, such as the Cray.« less
Experimental single-chip color HDTV image acquisition system with 8M-pixel CMOS image sensor
NASA Astrophysics Data System (ADS)
Shimamoto, Hiroshi; Yamashita, Takayuki; Funatsu, Ryohei; Mitani, Kohji; Nojiri, Yuji
2006-02-01
We have developed an experimental single-chip color HDTV image acquisition system using 8M-pixel CMOS image sensor. The sensor has 3840 × 2160 effective pixels and is progressively scanned at 60 frames per second. We describe the color filter array and interpolation method to improve image quality with a high-pixel-count single-chip sensor. We also describe an experimental image acquisition system we used to measured spatial frequency characteristics in the horizontal direction. The results indicate good prospects for achieving a high quality single chip HDTV camera that reduces pseudo signals and maintains high spatial frequency characteristics within the frequency band for HDTV.
DIABCARD a smart card for patients with chronic diseases.
Engelbrecht, R; Hildebrand, C
1997-01-01
Within the European Union-sponsored project DIABCARD, the core of a chip-card-based medical information system for patients with chronic diseases, exemplified on diabetes mellitus, has been developed. The long-term goal of the project is to improve the medical record and the quality of care for patients with chronic diseases. The basic idea is to have a portable electronic medical record on a smart card. This will improve the communication between the different healthcare personnel and between different institutions and, at the same time, promote shared care. The DIABCARD chip-card-based medical information system will offer controlled access to the necessary and up-to-date patient record to everyone involved in the patient's treatment, and it will help reduce the constantly rising healthcare expenditure. The system first was implemented in a small version. The system architecture contains hardware, software, and orgware. It considers especially the memory of the chip card, the processor, the data structure, security functions, the operating system on the chip card, the interface between the chip card and the application, and various application areas. The DIABCARD dataset was defined via an information model, which describes the different communication processes, via acknowledged diabetes datasets and medical scenarios. It includes, among others, emergency data, data for quality assurance, and data for blood glucose self-monitoring. The first prototype has been developed, and a pilot was run for 3 months.
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Performance of the Cell processor for biomolecular simulations
NASA Astrophysics Data System (ADS)
De Fabritiis, G.
2007-06-01
The new Cell processor represents a turning point for computing intensive applications. Here, I show that for molecular dynamics it is possible to reach an impressive sustained performance in excess of 30 Gflops with a peak of 45 Gflops for the non-bonded force calculations, over one order of magnitude faster than a single core standard processor.
Design and Implementation of a CMOS Chip for a Prolog
1988-03-01
generation scheme . We use the P -circuit [9] with pre-conditioning and post- conditioning 12,3] circuits to generate the carry. The implementation of...system generates vertical microcode for a general purpose processor, the NCR 9300 sys- S tem, from W- code [7]. Three significant pieces of software are...calculation block generating the pro- pagate ( P ) and generate (G) signals needed for carry calculation, and a sum block supplying the final result. The top
A new ATLAS muon CSC readout system with system on chip technology on ATCA platform
Claus, R.
2015-10-23
The ATLAS muon Cathode Strip Chamber (CSC) back-end readout system has been upgraded during the LHC 2013-2015 shutdown to be able to handle the higher Level-1 trigger rate of 100 kHz and the higher occupancy at Run 2 luminosity. The readout design is based on the Reconfiguration Cluster Element (RCE) concept for high bandwidth generic DAQ implemented on the ATCA platform. The RCE design is based on the new System on Chip Xilinx Zynq series with a processor-centric architecture with ARM processor embedded in FPGA fabric and high speed I/O resources together with auxiliary memories to form a versatile DAQmore » building block that can host applications tapping into both software and firmware resources. The Cluster on Board (COB) ATCA carrier hosts RCE mezzanines and an embedded Fulcrum network switch to form an online DAQ processing cluster. More compact firmware solutions on the Zynq for G-link, S-link and TTC allowed the full system of 320 G-links from the 32 chambers to be processed by 6 COBs in one ATCA shelf through software waveform feature extraction to output 32 S-links. Furthermore, the full system was installed in Sept. 2014. We will present the RCE/COB design concept, the firmware and software processing architecture, and the experience from the intense commissioning towards LHC Run 2.« less
A new ATLAS muon CSC readout system with system on chip technology on ATCA platform
NASA Astrophysics Data System (ADS)
Claus, R.; ATLAS Collaboration
2016-07-01
The ATLAS muon Cathode Strip Chamber (CSC) back-end readout system has been upgraded during the LHC 2013-2015 shutdown to be able to handle the higher Level-1 trigger rate of 100 kHz and the higher occupancy at Run 2 luminosity. The readout design is based on the Reconfiguration Cluster Element (RCE) concept for high bandwidth generic DAQ implemented on the ATCA platform. The RCE design is based on the new System on Chip Xilinx Zynq series with a processor-centric architecture with ARM processor embedded in FPGA fabric and high speed I/O resources together with auxiliary memories to form a versatile DAQ building block that can host applications tapping into both software and firmware resources. The Cluster on Board (COB) ATCA carrier hosts RCE mezzanines and an embedded Fulcrum network switch to form an online DAQ processing cluster. More compact firmware solutions on the Zynq for G-link, S-link and TTC allowed the full system of 320 G-links from the 32 chambers to be processed by 6 COBs in one ATCA shelf through software waveform feature extraction to output 32 S-links. The full system was installed in Sept. 2014. We will present the RCE/COB design concept, the firmware and software processing architecture, and the experience from the intense commissioning towards LHC Run 2.
A new ATLAS muon CSC readout system with system on chip technology on ATCA platform
NASA Astrophysics Data System (ADS)
Bartoldus, R.; Claus, R.; Garelli, N.; Herbst, R. T.; Huffer, M.; Iakovidis, G.; Iordanidou, K.; Kwan, K.; Kocian, M.; Lankford, A. J.; Moschovakos, P.; Nelson, A.; Ntekas, K.; Ruckman, L.; Russell, J.; Schernau, M.; Schlenker, S.; Su, D.; Valderanis, C.; Wittgen, M.; Yildiz, S. C.
2016-01-01
The ATLAS muon Cathode Strip Chamber (CSC) backend readout system has been upgraded during the LHC 2013-2015 shutdown to be able to handle the higher Level-1 trigger rate of 100 kHz and the higher occupancy at Run-2 luminosity. The readout design is based on the Reconfigurable Cluster Element (RCE) concept for high bandwidth generic DAQ implemented on the Advanced Telecommunication Computing Architecture (ATCA) platform. The RCE design is based on the new System on Chip XILINX ZYNQ series with a processor-centric architecture with ARM processor embedded in FPGA fabric and high speed I/O resources. Together with auxiliary memories, all these components form a versatile DAQ building block that can host applications tapping into both software and firmware resources. The Cluster on Board (COB) ATCA carrier hosts RCE mezzanines and an embedded Fulcrum network switch to form an online DAQ processing cluster. More compact firmware solutions on the ZYNQ for high speed input and output fiberoptic links and TTC allowed the full system of 320 input links from the 32 chambers to be processed by 6 COBs in one ATCA shelf. The full system was installed in September 2014. We will present the RCE/COB design concept, the firmware and software processing architecture, and the experience from the intense commissioning for LHC Run 2.
A new ATLAS muon CSC readout system with system on chip technology on ATCA platform
Bartoldus, R.; Claus, R.; Garelli, N.; ...
2016-01-25
The ATLAS muon Cathode Strip Chamber (CSC) backend readout system has been upgraded during the LHC 2013-2015 shutdown to be able to handle the higher Level-1 trigger rate of 100 kHz and the higher occupancy at Run-2 luminosity. The readout design is based on the Reconfigurable Cluster Element (RCE) concept for high bandwidth generic DAQ implemented on the Advanced Telecommunication Computing Architecture (ATCA) platform. The RCE design is based on the new System on Chip XILINX ZYNQ series with a processor-centric architecture with ARM processor embedded in FPGA fabric and high speed I/O resources. Together with auxiliary memories, all ofmore » these components form a versatile DAQ building block that can host applications tapping into both software and firmware resources. The Cluster on Board (COB) ATCA carrier hosts RCE mezzanines and an embedded Fulcrum network switch to form an online DAQ processing cluster. More compact firmware solutions on the ZYNQ for high speed input and output fiberoptic links and TTC allowed the full system of 320 input links from the 32 chambers to be processed by 6 COBs in one ATCA shelf. The full system was installed in September 2014. In conclusion, we will present the RCE/COB design concept, the firmware and software processing architecture, and the experience from the intense commissioning for LHC Run 2.« less
Kim, Gyungock; Park, Hyundai; Joo, Jiho; Jang, Ki-Seok; Kwack, Myung-Joon; Kim, Sanghoon; Gyoo Kim, In; Hyuk Oh, Jin; Ae Kim, Sun; Park, Jaegyu; Kim, Sanggi
2015-01-01
When silicon photonic integrated circuits (PICs), defined for transmitting and receiving optical data, are successfully monolithic-integrated into major silicon electronic chips as chip-level optical I/Os (inputs/outputs), it will bring innovative changes in data computing and communications. Here, we propose new photonic integration scheme, a single-chip optical transceiver based on a monolithic-integrated vertical photonic I/O device set including light source on bulk-silicon. This scheme can solve the major issues which impede practical implementation of silicon-based chip-level optical interconnects. We demonstrated a prototype of a single-chip photonic transceiver with monolithic-integrated vertical-illumination type Ge-on-Si photodetectors and VCSELs-on-Si on the same bulk-silicon substrate operating up to 50 Gb/s and 20 Gb/s, respectively. The prototype realized 20 Gb/s low-power chip-level optical interconnects for λ ~ 850 nm between fabricated chips. This approach can have a significant impact on practical electronic-photonic integration in high performance computers (HPC), cpu-memory interface, hybrid memory cube, and LAN, SAN, data center and network applications. PMID:26061463
Instruction-level performance modeling and characterization of multimedia applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luo, Y.; Cameron, K.W.
1999-06-01
One of the challenges for characterizing and modeling realistic multimedia applications is the lack of access to source codes. On-chip performance counters effectively resolve this problem by monitoring run-time behaviors at the instruction-level. This paper presents a novel technique of characterizing and modeling workloads at the instruction level for realistic multimedia applications using hardware performance counters. A variety of instruction counts are collected from some multimedia applications, such as RealPlayer, GSM Vocoder, MPEG encoder/decoder, and speech synthesizer. These instruction counts can be used to form a set of abstract characteristic parameters directly related to a processor`s architectural features. Based onmore » microprocessor architectural constraints and these calculated abstract parameters, the architectural performance bottleneck for a specific application can be estimated. Meanwhile, the bottleneck estimation can provide suggestions about viable architectural/functional improvement for certain workloads. The biggest advantage of this new characterization technique is a better understanding of processor utilization efficiency and architectural bottleneck for each application. This technique also provides predictive insight of future architectural enhancements and their affect on current codes. In this paper the authors also attempt to model architectural effect on processor utilization without memory influence. They derive formulas for calculating CPI{sub 0}, CPI without memory effect, and they quantify utilization of architectural parameters. These equations are architecturally diagnostic and predictive in nature. Results provide promise in code characterization, and empirical/analytical modeling.« less
ROSE::FTTransform - A Source-to-Source Translation Framework for Exascale Fault-Tolerance Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lidman, J; Quinlan, D; Liao, C
2012-03-26
Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by processors driven at lower voltages (near the minimum 0.5 volts for current transistors). At these voltage levels, the rate of transient errors increases dramatically due to the sensitivity to transient and geographically localized voltage drops on parts of the processor chip. To achieve power efficiency, these processors are likely to be streamlined and minimal, and thus they cannot be expected to handle transient errors entirely in hardware. Here we present anmore » open, compiler-based framework to automate the armoring of High Performance Computing (HPC) software to protect it from these types of transient processor errors. We develop an open infrastructure to support research work in this area, and we define tools that, in the future, may provide more complete automated and/or semi-automated solutions to support software resiliency on future exascale architectures. Results demonstrate that our approach is feasible, pragmatic in how it can be separated from the software development process, and reasonably efficient (0% to 30% overhead for the Jacobi iteration on common hardware; and 20%, 40%, 26%, and 2% overhead for a randomly selected subset of benchmarks from the Livermore Loops [1]).« less
Peng, Ran; Li, Dongqing
2016-10-07
The ability to create reproducible and inexpensive nanofluidic chips is essential to the fundamental research and applications of nanofluidics. This paper presents a novel and cost-effective method for fabricating a single nanochannel or multiple nanochannels in PDMS chips with controllable channel size and spacing. Single nanocracks or nanocrack arrays, positioned by artificial defects, are first generated on a polystyrene surface with controllable size and spacing by a solvent-induced method. Two sets of optimal working parameters are developed to replicate the nanocracks onto the polymer layers to form the nanochannel molds. The nanochannel molds are used to make the bi-layer PDMS microchannel-nanochannel chips by simple soft lithography. An alignment system is developed for bonding the nanofluidic chips under an optical microscope. Using this method, high quality PDMS nanofluidic chips with a single nanochannel or multiple nanochannels of sub-100 nm width and height and centimeter length can be obtained with high repeatability.
Networked Workstations and Parallel Processing Utilizing Functional Languages
1993-03-01
program . This frees the programmer to concentrate on what the program is to do, not how the program is...traditional ’von Neumann’ architecture uses a timer based (e.g., the program counter), sequentially pro- grammed, single processor approach to problem...traditional ’von Neumann’ architecture uses a timer based (e.g., the program counter), sequentially programmed , single processor approach to
Kim, Jinho; Cho, Hyungseok; Han, Song-I; Han, Ki-Ho
2016-05-03
This paper introduces a single-cell isolation technology for circulating tumor cells (CTCs) using a microfluidic device (the "SIM-Chip"). The SIM-Chip comprises a lateral magnetophoretic microseparator and a microdispenser as a two-step cascade platform. First, CTCs were enriched from whole blood by the lateral magnetophoretic microseparator based on immunomagnetic nanobeads. Next, the enriched CTCs were electrically identified by single-cell impedance cytometer and isolated as single cells using the microshooter. Using 200 μL of whole blood spiked with 50 MCF7 breast cancer cells, the analysis demonstrated that the single-cell isolation efficiency of the SIM-Chip was 82.4%, and the purity of the isolated MCF7 cells with respect to WBCs was 92.45%. The data also showed that the WBC depletion rate of the SIM-Chip was 2.5 × 10(5) (5.4-log). The recovery rates were around 99.78% for spiked MCF7 cells ranging in number from 10 to 90. The isolated single MCF7 cells were intact and could be used for subsequent downstream genetic assays, such as RT-PCR. Single-cell culture evaluation of the proliferation of MCF7 cells isolated by the SIM-Chip showed that 84.1% of cells at least doubled in 5 days. Consequently, the SIM-Chip could be used for single-cell isolation of rare target cells from whole blood with high purity and recovery without cell damage.
Smart Power Supply for Battery-Powered Systems
NASA Technical Reports Server (NTRS)
Krasowski, Michael J.; Greer, Lawrence; Prokop, Norman F.; Flatico, Joseph M.
2010-01-01
A power supply for battery-powered systems has been designed with an embedded controller that is capable of monitoring and maintaining batteries, charging hardware, while maintaining output power. The power supply is primarily designed for rovers and other remote science and engineering vehicles, but it can be used in any battery alone, or battery and charging source applications. The supply can function autonomously, or can be connected to a host processor through a serial communications link. It can be programmed a priori or on the fly to return current and voltage readings to a host. It has two output power busses: a constant 24-V direct current nominal bus, and a programmable bus for output from approximately 24 up to approximately 50 V. The programmable bus voltage level, and its output power limit, can be changed on the fly as well. The power supply also offers options to reduce the programmable bus to 24 V when the set power limit is reached, limiting output power in the case of a system fault detected in the system. The smart power supply is based on an embedded 8051-type single-chip microcontroller. This choice was made in that a credible progression to flight (radiation hard, high reliability) can be assumed as many 8051 processors or gate arrays capable of accepting 8051-type core presently exist and will continue to do so for some time. To solve the problem of centralized control, this innovation moves an embedded microcontroller to the power supply and assigns it the task of overseeing the operation and charging of the power supply assets. This embedded processor is connected to the application central processor via a serial data link such that the central processor can request updates of various parameters within the supply, such as battery current, bus voltage, remaining power in battery estimations, etc. This supply has a direct connection to the battery bus for common (quiescent) power application. Because components from multiple vendors may have differing power needs, this supply also has a secondary power bus, which can be programmed a priori or on-the-fly to boost the primary battery voltage level from 24 to 50 V to accommodate various loads as they are brought on line. Through voltage and current monitoring, the device can also shield the charging source from overloads, keep it within safe operating modes, and can meter available power to the application and maintain safe operations.
Shared performance monitor in a multiprocessor system
Chiu, George; Gara, Alan G.; Salapura, Valentina
2012-07-24
A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
NASA Astrophysics Data System (ADS)
Guo, Baoshan; Lei, Cheng; Ito, Takuro; Yaxiaer, Yalikun; Kobayashi, Hirofumi; Jiang, Yiyue; Tanaka, Yo; Ozeki, Yasuyuki; Goda, Keisuke
2017-02-01
The development of reliable, sustainable, and economical sources of alternative fuels is an important, but challenging goal for the world. As an alternative to liquid fossil fuels, microalgal biofuel is expected to play a key role in reducing the detrimental effects of global warming since microalgae absorb atmospheric CO2 via photosynthesis. Unfortunately, conventional analytical methods only provide population-averaged lipid contents and fail to characterize a diverse population of microalgal cells with single-cell resolution in a noninvasive and interference-free manner. Here we demonstrate high-throughput label-free single-cell screening of lipid-producing microalgal cells with optofluidic time-stretch quantitative phase microscopy. In particular, we use Euglena gracilis - an attractive microalgal species that produces wax esters (suitable for biodiesel and aviation fuel after refinement) within lipid droplets. Our optofluidic time-stretch quantitative phase microscope is based on an integration of a hydrodynamic-focusing microfluidic chip, an optical time-stretch phase-contrast microscope, and a digital image processor equipped with machine learning. As a result, it provides both the opacity and phase contents of every single cell at a high throughput of 10,000 cells/s. We characterize heterogeneous populations of E. gracilis cells under two different culture conditions to evaluate their lipid production efficiency. Our method holds promise as an effective analytical tool for microalgaebased biofuel production.
The Efficiency and the Scalability of an Explicit Operator on an IBM POWER4 System
NASA Technical Reports Server (NTRS)
Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)
2002-01-01
We present an evaluation of the efficiency and the scalability of an explicit CFD operator on an IBM POWER4 system. The POWER4 architecture exhibits a common trend in HPC architectures: boosting CPU processing power by increasing the number of functional units, while hiding the latency of memory access by increasing the depth of the memory hierarchy. The overall machine performance depends on the ability of the caches-buses-fabric-memory to feed the functional units with the data to be processed. In this study we evaluate the efficiency and scalability of one explicit CFD operator on an IBM POWER4. This operator performs computations at the points of a Cartesian grid and involves a few dozen floating point numbers and on the order of 100 floating point operations per grid point. The computations in all grid points are independent. Specifically, we estimate the efficiency of the RHS operator (SP of NPB) on a single processor as the observed/peak performance ratio. Then we estimate the scalability of the operator on a single chip (2 CPUs), a single MCM (8 CPUs), 16 CPUs, and the whole machine (32 CPUs). Then we perform the same measurements for a chache-optimized version of the RHS operator. For our measurements we use the HPM (Hardware Performance Monitor) counters available on the POWER4. These counters allow us to analyze the obtained performance results.
Performance of a plasma fluid code on the Intel parallel computers
NASA Technical Reports Server (NTRS)
Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.
1992-01-01
One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
CMOS Image Sensor with a Built-in Lane Detector.
Hsiao, Pei-Yung; Cheng, Hsien-Chein; Huang, Shih-Shinh; Fu, Li-Chen
2009-01-01
This work develops a new current-mode mixed signal Complementary Metal-Oxide-Semiconductor (CMOS) imager, which can capture images and simultaneously produce vehicle lane maps. The adopted lane detection algorithm, which was modified to be compatible with hardware requirements, can achieve a high recognition rate of up to approximately 96% under various weather conditions. Instead of a Personal Computer (PC) based system or embedded platform system equipped with expensive high performance chip of Reduced Instruction Set Computer (RISC) or Digital Signal Processor (DSP), the proposed imager, without extra Analog to Digital Converter (ADC) circuits to transform signals, is a compact, lower cost key-component chip. It is also an innovative component device that can be integrated into intelligent automotive lane departure systems. The chip size is 2,191.4 × 2,389.8 μm, and the package uses 40 pin Dual-In-Package (DIP). The pixel cell size is 18.45 × 21.8 μm and the core size of photodiode is 12.45 × 9.6 μm; the resulting fill factor is 29.7%.
A Low Power SOC Architecture for the V2.0+EDR Bluetooth Using a Unified Verification Platform
NASA Astrophysics Data System (ADS)
Kim, Jeonghun; Kim, Suki; Baek, Kwang-Hyun
This paper presents a low-power System on Chip (SOC) architecture for the v2.0+EDR (Enhanced Data Rate) Bluetooth and its applications. Our design includes a link controller, modem, RF transceiver, Sub-Band Codec (SBC), Expanded Instruction Set Computer (ESIC) processor, and peripherals. To decrease power consumption of the proposed SOC, we reduce data transfer using a dual-port memory, including a power management unit, and a clock gated approach. We also address some of issues and benefits of reusable and unified environment on a centralized data structure and SOC verification platform. This includes flexibility in meeting the final requirements using technology-independent tools wherever possible in various processes and for projects. The other aims of this work are to minimize design efforts by avoiding the same work done twice by different people and to reuse the similar environment and platform for different projects. This chip occupies a die size of 30mm2 in 0.18µm CMOS, and the worst-case current of the total chip is 54mA.
On-board landmark navigation and attitude reference parallel processor system
NASA Technical Reports Server (NTRS)
Gilbert, L. E.; Mahajan, D. T.
1978-01-01
An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.
Demonstration program for Omega receiver prototype microcomputer data processing
NASA Technical Reports Server (NTRS)
Lilley, R. W.
1976-01-01
The JOLT (TM) commercial microcomputer, based on the MOS Technology 6502 processor chip, for use in Omega navigation system is evaluated. A computer program was prepared in hand-assembled code to demonstrate receiver operation. The processor provides binary processing with interrupts enabled, a carriage return is given to initialize the teleprinter, and a jump is performed to enter the program loop to wait for an interrupt. The program loop operates continuously testing the interrupt flag. The interrupt routine reads the receiver status word and determines whether the current time-slot is the A slot. If so, the interrupt flag, which is also the data index pointer, is reset to zero. The status word is stored in the status buffer. If the time-slot is not A, the interrupt flag/pointer is incremented by one to index the phase and status to the proper buffer words for later use by the print routine.
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications
NASA Technical Reports Server (NTRS)
Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer
1997-01-01
A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi
Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...
2015-05-22
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Multicore Challenges and Benefits for High Performance Scientific Computing
Nielsen, Ida M. B.; Janssen, Curtis L.
2008-01-01
Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
NASA Technical Reports Server (NTRS)
Sterling, T. L.; Zima, H. P.
2002-01-01
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.
Laser doppler blood flow imaging using a CMOS imaging sensor with on-chip signal processing.
He, Diwei; Nguyen, Hoang C; Hayes-Gill, Barrie R; Zhu, Yiqun; Crowe, John A; Gill, Cally; Clough, Geraldine F; Morgan, Stephen P
2013-09-18
The first fully integrated 2D CMOS imaging sensor with on-chip signal processing for applications in laser Doppler blood flow (LDBF) imaging has been designed and tested. To obtain a space efficient design over 64 × 64 pixels means that standard processing electronics used off-chip cannot be implemented. Therefore the analog signal processing at each pixel is a tailored design for LDBF signals with balanced optimization for signal-to-noise ratio and silicon area. This custom made sensor offers key advantages over conventional sensors, viz. the analog signal processing at the pixel level carries out signal normalization; the AC amplification in combination with an anti-aliasing filter allows analog-to-digital conversion with a low number of bits; low resource implementation of the digital processor enables on-chip processing and the data bottleneck that exists between the detector and processing electronics has been overcome. The sensor demonstrates good agreement with simulation at each design stage. The measured optical performance of the sensor is demonstrated using modulated light signals and in vivo blood flow experiments. Images showing blood flow changes with arterial occlusion and an inflammatory response to a histamine skin-prick demonstrate that the sensor array is capable of detecting blood flow signals from tissue.
Intelligent operations of the data acquisition system of the ATLAS experiment at LHC
NASA Astrophysics Data System (ADS)
Anders, G.; Avolio, G.; Lehmann Miotto, G.; Magnoni, L.
2015-05-01
The ATLAS experiment at the Large Hadron Collider at CERN relies on a complex and highly distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data obtained at unprecedented energy and rates. The Run Control (RC) system is the component steering the data acquisition by starting and stopping processes and by carrying all data-taking elements through well-defined states in a coherent way. Taking into account all the lessons learnt during LHC's Run 1, the RC has been completely re-designed and re-implemented during the LHC Long Shutdown 1 (LS1) phase. As a result of the new design, the RC is assisted by the Central Hint and Information Processor (CHIP) service that can be truly considered its “brain”. CHIP is an intelligent system able to supervise the ATLAS data taking, take operational decisions and handle abnormal conditions. In this paper, the design, implementation and performances of the RC/CHIP system will be described. A particular emphasis will be put on the way the RC and CHIP cooperate and on the huge benefits brought by the Complex Event Processing engine. Additionally, some error recovery scenarios will be analysed for which the intervention of human experts is now rendered unnecessary.
International Conference on Stiff Computation Held at Park City, Utah on April 12, 13 and 14, 1982.
1983-05-31
algorithm should be designed which can analyse a system description and find out for the user ~to which class of problems his system belongs... Dove...processors designed to implement aspecific solution process. yrne: IEE floating point chip design " used by INE and others is an example (Xahan)...the...hardware speciaList has designed his computer such that the paraL#L features can be addressed convenientLy and !! ’) efficientLy, and 4;) the software
2000-10-01
available from rooksj~,rl.af.mil [4] J. Lyke and G. Forman "Microengineering Aerospace Systems" H . Helvajian editor, The Aerospace Press 1999, Chapter 8...e h I O iinterface chip, and Synchronous Dynamic Random 1K-byte. The only consequence is that after the FIFO is Access Memory (SDRAM). Each interface...shown in figure 4a, that will be used for the 1/O interconnects in place of the perimeter bond pads used in the MCM3A. The 6’ h layer is used to
Command/response protocols and concurrent software
NASA Technical Reports Server (NTRS)
Bynum, W. L.
1987-01-01
A version of the program to control the parallel jaw gripper is documented. The parallel jaw end-effector hardware and the Intel 8031 processor that is used to control the end-effector are briefly described. A general overview of the controller program is given and a complete description of the program's structure and design are contained. There are three appendices: a memory map of the on-chip RAM, a cross-reference listing of the self-scheduling routines, and a summary of the top-level and monitor commands.
μOrgano: A Lego®-Like Plug & Play System for Modular Multi-Organ-Chips.
Loskill, Peter; Marcus, Sivan G; Mathur, Anurag; Reese, Willie Mae; Healy, Kevin E
2015-01-01
Human organ-on-a-chip systems for drug screening have evolved as feasible alternatives to animal models, which are unreliable, expensive, and at times erroneous. While chips featuring single organs can be of great use for both pharmaceutical testing and basic organ-level studies, the huge potential of the organ-on-a-chip technology is revealed by connecting multiple organs on one chip to create a single integrated system for sophisticated fundamental biological studies and devising therapies for disease. Furthermore, since most organ-on-a-chip systems require special protocols with organ-specific media for the differentiation and maturation of the tissues, multi-organ systems will need to be temporally customizable and flexible in terms of the time point of connection of the individual organ units. We present a customizable Lego®-like plug & play system, μOrgano, which enables initial individual culture of single organ-on-a-chip systems and subsequent connection to create integrated multi-organ microphysiological systems. As a proof of concept, the μOrgano system was used to connect multiple heart chips in series with excellent cell viability and spontaneously physiological beat rates.
μOrgano: A Lego®-Like Plug & Play System for Modular Multi-Organ-Chips
Loskill, Peter; Marcus, Sivan G.; Mathur, Anurag; Reese, Willie Mae; Healy, Kevin E.
2015-01-01
Human organ-on-a-chip systems for drug screening have evolved as feasible alternatives to animal models, which are unreliable, expensive, and at times erroneous. While chips featuring single organs can be of great use for both pharmaceutical testing and basic organ-level studies, the huge potential of the organ-on-a-chip technology is revealed by connecting multiple organs on one chip to create a single integrated system for sophisticated fundamental biological studies and devising therapies for disease. Furthermore, since most organ-on-a-chip systems require special protocols with organ-specific media for the differentiation and maturation of the tissues, multi-organ systems will need to be temporally customizable and flexible in terms of the time point of connection of the individual organ units. We present a customizable Lego®-like plug & play system, μOrgano, which enables initial individual culture of single organ-on-a-chip systems and subsequent connection to create integrated multi-organ microphysiological systems. As a proof of concept, the μOrgano system was used to connect multiple heart chips in series with excellent cell viability and spontaneously physiological beat rates. PMID:26440672
Programmable Remapper with Single Flow Architecture
NASA Technical Reports Server (NTRS)
Fisher, Timothy E. (Inventor)
1993-01-01
An apparatus for image processing comprising a camera for receiving an original visual image and transforming the original visual image into an analog image, a first converter for transforming the analog image of the camera to a digital image, a processor having a single flow architecture for receiving the digital image and producing, with a single algorithm, an output image, a second converter for transforming the digital image of the processor to an analog image, and a viewer for receiving the analog image, transforming the analog image into a transformed visual image for observing the transformations applied to the original visual image. The processor comprises one or more subprocessors for the parallel reception of a digital image for producing an output matrix of the transformed visual image. More particularly, the processor comprises a plurality of subprocessors for receiving in parallel and transforming the digital image for producing a matrix of the transformed visual image, and an output interface means for receiving the respective portions of the transformed visual image from the respective subprocessor for producing an output matrix of the transformed visual image.
Holo-Chidi video concentrator card
NASA Astrophysics Data System (ADS)
Nwodoh, Thomas A.; Prabhakar, Aditya; Benton, Stephen A.
2001-12-01
The Holo-Chidi Video Concentrator Card is a frame buffer for the Holo-Chidi holographic video processing system. Holo- Chidi is designed at the MIT Media Laboratory for real-time computation of computer generated holograms and the subsequent display of the holograms at video frame rates. The Holo-Chidi system is made of two sets of cards - the set of Processor cards and the set of Video Concentrator Cards (VCCs). The Processor cards are used for hologram computation, data archival/retrieval from a host system, and for higher-level control of the VCCs. The VCC formats computed holographic data from multiple hologram computing Processor cards, converting the digital data to analog form to feed the acousto-optic-modulators of the Media lab's Mark-II holographic display system. The Video Concentrator card is made of: a High-Speed I/O (HSIO) interface whence data is transferred from the hologram computing Processor cards, a set of FIFOs and video RAM used as buffer for data for the hololines being displayed, a one-chip integrated microprocessor and peripheral combination that handles communication with other VCCs and furnishes the card with a USB port, a co-processor which controls display data formatting, and D-to-A converters that convert digital fringes to analog form. The co-processor is implemented with an SRAM-based FPGA with over 500,000 gates and controls all the signals needed to format the data from the multiple Processor cards into the format required by Mark-II. A VCC has three HSIO ports through which up to 500 Megabytes of computed holographic data can flow from the Processor Cards to the VCC per second. A Holo-Chidi system with three VCCs has enough frame buffering capacity to hold up to thirty two 36Megabyte hologram frames at a time. Pre-computed holograms may also be loaded into the VCC from a host computer through the low- speed USB port. Both the microprocessor and the co- processor in the VCC can access the main system memory used to store control programs and data for the VCC. The Card also generates the control signals used by the scanning mirrors of Mark-II. In this paper we discuss the design of the VCC and its implementation in the Holo-Chidi system.
On-chip detection of non-classical light by scalable integration of single-photon detectors
Najafi, Faraz; Mower, Jacob; Harris, Nicholas C.; Bellei, Francesco; Dane, Andrew; Lee, Catherine; Hu, Xiaolong; Kharel, Prashanta; Marsili, Francesco; Assefa, Solomon; Berggren, Karl K.; Englund, Dirk
2015-01-01
Photonic-integrated circuits have emerged as a scalable platform for complex quantum systems. A central goal is to integrate single-photon detectors to reduce optical losses, latency and wiring complexity associated with off-chip detectors. Superconducting nanowire single-photon detectors (SNSPDs) are particularly attractive because of high detection efficiency, sub-50-ps jitter and nanosecond-scale reset time. However, while single detectors have been incorporated into individual waveguides, the system detection efficiency of multiple SNSPDs in one photonic circuit—required for scalable quantum photonic circuits—has been limited to <0.2%. Here we introduce a micrometer-scale flip-chip process that enables scalable integration of SNSPDs on a range of photonic circuits. Ten low-jitter detectors are integrated on one circuit with 100% device yield. With an average system detection efficiency beyond 10%, and estimated on-chip detection efficiency of 14–52% for four detectors operated simultaneously, we demonstrate, to the best of our knowledge, the first on-chip photon correlation measurements of non-classical light. PMID:25575346
Sequence information signal processor
Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.
1999-01-01
An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Single-chip microcomputer application in high-altitude balloon orientation system
NASA Technical Reports Server (NTRS)
Lim, T. S.; Ehrmann, C. H.; Allison, S. R.
1980-01-01
This paper describes the application of a single-chip microcomputer in a high-altitude balloon instrumentation system. The system, consisting of a magnetometer, a stepping motor, a microcomputer and a gray code shaft encoder, is used to provide an orientation reference to point a scientific instrument at an object in space. The single-chip microcomputer, Intel's 8748, consisting of a CPU, program memory, data memory and I/O ports, is used to control the orientation of the system.
NASA Astrophysics Data System (ADS)
Gunay, Omer; Ozsarac, Ismail; Kamisli, Fatih
2017-05-01
Video recording is an essential property of new generation military imaging systems. Playback of the stored video on the same device is also desirable as it provides several operational benefits to end users. Two very important constraints for many military imaging systems, especially for hand-held devices and thermal weapon sights, are power consumption and size. To meet these constraints, it is essential to perform most of the processing applied to the video signal, such as preprocessing, compression, storing, decoding, playback and other system functions on a single programmable chip, such as FPGA, DSP, GPU or ASIC. In this work, H.264/AVC (Advanced Video Coding) compatible video compression, storage, decoding and playback blocks are efficiently designed and implemented on FPGA platforms using FPGA fabric and Altera NIOS II soft processor. Many subblocks that are used in video encoding are also used during video decoding in order to save FPGA resources and power. Computationally complex blocks are designed using FPGA fabric, while blocks such as SD card write/read, H.264 syntax decoding and CAVLC decoding are done using NIOS processor to benefit from software flexibility. In addition, to keep power consumption low, the system was designed to require limited external memory access. The design was tested using 640x480 25 fps thermal camera on CYCLONE V FPGA, which is the ALTERA's lowest power FPGA family, and consumes lower than 40% of CYCLONE V 5CEFA7 FPGA resources on average.
Development of the SEASIS instrument for SEDSAT
NASA Technical Reports Server (NTRS)
Maier, Mark W.
1996-01-01
Two SEASIS experiment objectives are key: take images that allow three axis attitude determination and take multi-spectral images of the earth. During the tether mission it is also desirable to capture images for the recoiling tether from the endmass perspective (which has never been observed). SEASIS must store all its imagery taken during the tether mission until the earth downlink can be established. SEASIS determines attitude with a panoramic camera and performs earth observation with a telephoto lens camera. Camera video is digitized, compressed, and stored in solid state memory. These objectives are addressed through the following architectural choices: (1) A camera system using a Panoramic Annular Lens (PAL). This lens has a 360 deg. azimuthal field of view by a +45 degree vertical field measured from a plan normal to the lens boresight axis. It has been shown in Mr. Mark Steadham's UAH M.S. thesis that his camera can determine three axis attitude anytime the earth and one other recognizable celestial object (for example, the sun) is in the field of view. This will be essentially all the time during tether deployment. (2) A second camera system using telephoto lens and filter wheel. The camera is a black and white standard video camera. The filters are chosen to cover the visible spectral bands of remote sensing interest. (3) A processor and mass memory arrangement linked to the cameras. Video signals from the cameras are digitized, compressed in the processor, and stored in a large static RAM bank. The processor is a multi-chip module consisting of a T800 Transputer and three Zoran floating point Digital Signal Processors. This processor module was supplied under ARPA contract by the Space Computer Corporation to demonstrate its use in space.
Ultralow power trapping and fluorescence detection of single particles on an optofluidic chip.
Kühn, S; Phillips, B S; Lunt, E J; Hawkins, A R; Schmidt, H
2010-01-21
The development of on-chip methods to manipulate particles is receiving rapidly increasing attention. All-optical traps offer numerous advantages, but are plagued by large required power levels on the order of hundreds of milliwatts and the inability to act exclusively on individual particles. Here, we demonstrate a fully integrated electro-optical trap for single particles with optical excitation power levels that are five orders of magnitude lower than in conventional optical force traps. The trap is based on spatio-temporal light modulation that is implemented using networks of antiresonant reflecting optical waveguides. We demonstrate the combination of on-chip trapping and fluorescence detection of single microorganisms by studying the photobleaching dynamics of stained DNA in E. coli bacteria. The favorable size scaling facilitates the trapping of single nanoparticles on integrated optofluidic chips.
Tracking Clouds with low cost GNSS chips aided by the Arduino platform
NASA Astrophysics Data System (ADS)
Hameed, Saji; Realini, Eugenio; Ishida, Shinya
2016-04-01
The Global Navigation Satellite System (GNSS) is a constellation of satellites that is used to provide geo-positioning services. Besides this application, the GNSS system is important for a wide range of scientific and civilian applications. For example, GNSS systems are routinely used in civilian applications such as surveying and scientific applications such as the study of crustal deformation. Another important scientific application of GNSS system is in meteorological research. Here it is mainly used to determine the total water vapour content of the troposphere, hereafter Precipitable Water Vapor (PWV). However, both GNSS receivers and software have prohibitively high price due to a variety of reasons. To overcome this somewhat artificial barrier we are exploring the use of low-cost GNSS receivers along with open source GNSS software for scientific research, in particular for GNSS meteorology research. To achieve this aim, we have developed a custom Arduino compatible data logging board that is able to operate together with a specific low-cost single frequency GNSS receiver chip from NVS Technologies AG. We have also developed an open-source software bundle that includes a new Arduino core for the Atmel324p chip, which is the main processor used in our custom logger. We have also developed software code that enables data collection, logging and parsing of the GNSS data stream. Additionally we have comprehensively evaluated the low power characteristics of the GNSS receiver and logger boards. Currently we are exploring the use of several openly source or free to use for research software to map GNSS delays to PWV. These include the open source goGPS (http://www.gogps-project.org/) and gLAB (http://gage.upc.edu/gLAB) and the openly available GAMIT software from Massachusetts Institute of Technology (MIT). We note that all the firmware and software developed as part of this project is available on an open source license.
USDA-ARS?s Scientific Manuscript database
The periodic need to restock reagent pools for genotyping chips provides an opportunity to increase the number of single-nucleotide polymorphisms (SNP) on a chip at no increase in cost. A high-density chip with >140,000 SNP has been developed by GeneSeek Inc. (Lincoln, NE) to increase accuracy of ge...
A microprocessor based high speed packet switch for satellite communications
NASA Technical Reports Server (NTRS)
Arozullah, M.; Crist, S. C.
1980-01-01
The architectures of a single processor, a three processor, and a multiple processor system are described. The hardware circuits, and software routines required for implementing the three and multiple processor designs are presented. A bit-slice microprocessor was designed and microprogrammed. Maximum throughput was calculated for all three designs. Queue theoretic models for these three designs were developed and utilized to obtain analytical expressions for the average waiting times, overall average response times and average queue sizes. From these expressions, graphs were obtained showing the effect on the system performance of a number of design parameters.
A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy
NASA Astrophysics Data System (ADS)
Veiga, Alejandro; Grunfeld, Christian
2016-02-01
The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.
Helicase dependent OnChip-amplification and its use in multiplex pathogen detection.
Andresen, Dennie; von Nickisch-Rosenegk, Markus; Bier, Frank F
2009-05-01
The need for fast, specific and sensitive multiparametric detection methods is an ever growing demand in molecular diagnostics. Here we report on a newly developed method, the helicase dependent OnChip amplification (OnChip-HDA). This approach integrates the analysis and detection in one single reaction thus leading to time and cost savings in multiparametric analysis. HDA is an isothermal amplification method that is not depending on thermocycling as known from PCR due to the helicases' ability to unwind DNA double-strands. We have combined the HDA with microarray based detection, making it suitable for multiplex detection. As an example we used the OnChip HDA in single and multiplex amplifications for the detection of the two pathogens N. gonorrhoeae and S. aureus directly on surface bound primers. We have successfully shown the OnChip-HDA and applied it for single- and duplex-detection of the pathogens N. gonorrhoeae and S. aureus. We have developed a new method, the OnChip-HDA for the multiplex detection of pathogens. Its simplicity in reaction setup and potential for miniaturization and multiparametric analysis is advantageous for the integration in miniaturized Lab on Chip systems, e.g. needed in point of care diagnostics.
Modeling Large Scale Circuits Using Massively Parallel Descrete-Event Simulation
2013-06-01
exascale levels of performance, the smallest elements of a single processor can greatly affect the entire computer system (e.g. its power consumption...grow to exascale levels of performance, the smallest elements of a single processor can greatly affect the entire computer system (e.g. its power...Warp Speed 10.0. 2.0 INTRODUCTION As supercomputer systems approach exascale , the core count will exceed 1024 and number of transistors used in
Integrable microwave filter based on a photonic crystal delay line.
Sancho, Juan; Bourderionnet, Jerome; Lloret, Juan; Combrié, Sylvain; Gasulla, Ivana; Xavier, Stephane; Sales, Salvador; Colman, Pierre; Lehoucq, Gaelle; Dolfi, Daniel; Capmany, José; De Rossi, Alfredo
2012-01-01
The availability of a tunable delay line with a chip-size footprint is a crucial step towards the full implementation of integrated microwave photonic signal processors. Achieving a large and tunable group delay on a millimetre-sized chip is not trivial. Slow light concepts are an appropriate solution, if propagation losses are kept acceptable. Here we use a low-loss 1.5 mm-long photonic crystal waveguide to demonstrate both notch and band-pass microwave filters that can be tuned over the 0-50-GHz spectral band. The waveguide is capable of generating a controllable delay with limited signal attenuation (total insertion loss below 10 dB when the delay is below 70 ps) and degradation. Owing to the very small footprint of the delay line, a fully integrated device is feasible, also featuring more complex and elaborate filter functions.
NASA Astrophysics Data System (ADS)
Iwato, Hirofumi; Sakanushi, Keishi; Takeuchi, Yoshinori; Imai, Masaharu
To measure the detrusor pressure for diagnosing lower urinary tract symptoms, we designed a small-area and low-power System on a Chip (SoC). The SoC should be small and low power because it is encapsulated in tiny air-tight capsules which are simultaneously inserted in the urinary bladder and rectum for several days. Since the SoC is also required to be programmable, we designed an Application Specific Instruction set Processor (ASIP) for pressure measurement and wireless communication, and implemented almost required functions on the ASIP. The SoC was fabricated using a 0.18µm CMOS mixed-signal process and the chip size is 2.5×2.5mm2. Evaluation results show that the power consumption of the SoC is 93.5µW, and that it can operate the capsule for seven days with a tiny battery.
Power-Aware Compiler Controllable Chip Multiprocessor
NASA Astrophysics Data System (ADS)
Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori
A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.
[A heart function measuring and analyzing instrument based on single-chip microcomputer].
Rong, Z; Liang, H; Wang, S
1999-05-01
An Introduction a measuring and analyzing instrument, based on the single-chip microcomputer, which provides sample gathering, processing, controlling, adjusting, keyboard and printing. All informations are provided and displayed in Chinese.
Simulink/PARS Integration Support
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vacaliuc, B.; Nakhaee, N.
2013-12-18
The state of the art for signal processor hardware has far out-paced the development tools for placing applications on that hardware. In addition, signal processors are available in a variety of architectures, each uniquely capable of handling specific types of signal processing efficiently. With these processors becoming smaller and demanding less power, it has become possible to group multiple processors, a heterogeneous set of processors, into single systems. Different portions of the desired problem set can be assigned to different processor types as appropriate. As software development tools do not keep pace with these processors, especially when multiple processors ofmore » different types are used, a method is needed to enable software code portability among multiple processors and multiple types of processors along with their respective software environments. Sundance DSP, Inc. has developed a software toolkit called “PARS”, whose objective is to provide a framework that uses suites of tools provided by different vendors, along with modeling tools and a real time operating system, to build an application that spans different processor types. The software language used to express the behavior of the system is a very high level modeling language, “Simulink”, a MathWorks product. ORNL has used this toolkit to effectively implement several deliverables. This CRADA describes this collaboration between ORNL and Sundance DSP, Inc.« less
European Science Notes Information Bulletin Reports on Current European/ Middle Eastern Science
1988-08-01
problems, and infrastructure and in- terfacing requirements. Development of Finite Element Software for Transputer-Based Parallel Processors ...Introduction will it be possible to harness these processors together to work on a common problem. The feasibility study at the UK’s Kent University for One of...the many problems in harnessing the power development of a distributed supercomputer is being of a large number of processors on a single problem is
Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas
2015-01-01
Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
From Genes to Protein Mechanics on a Chip
Milles, Lukas F.; Verdorfer, Tobias; Pippig, Diana A.; Nash, Michael A.; Gaub, Hermann E.
2014-01-01
Single-molecule force spectroscopy enables mechanical testing of individual proteins, however low experimental throughput limits the ability to screen constructs in parallel. We describe a microfluidic platform for on-chip protein expression and measurement of single-molecule mechanical properties. We constructed microarrays of proteins covalently attached to a chip surface, and found that a single cohesin-modified cantilever that bound to the terminal dockerin-tag of each protein remained stable over thousands of pulling cycles. The ability to synthesize and mechanically probe protein libraries presents new opportunities for high-throughput mechanical phenotyping. PMID:25194847
Kaneda, Shohei; Ono, Koichi; Fukuba, Tatsuhiro; Nojima, Takahiko; Yamamoto, Takatoki; Fujii, Teruo
2011-01-01
In this paper, a rapid and simple method to determine the optimal temperature conditions for denaturant electrophoresis using a temperature-controlled on-chip capillary electrophoresis (CE) device is presented. Since on-chip CE operations including sample loading, injection and separation are carried out just by switching the electric field, we can repeat consecutive run-to-run CE operations on a single on-chip CE device by programming the voltage sequences. By utilizing the high-speed separation and the repeatability of the on-chip CE, a series of electrophoretic operations with different running temperatures can be implemented. Using separations of reaction products of single-stranded DNA (ssDNA) with a peptide nucleic acid (PNA) oligomer, the effectiveness of the presented method to determine the optimal temperature conditions required to discriminate a single-base substitution (SBS) between two different ssDNAs is demonstrated. It is shown that a single run for one temperature condition can be executed within 4 min, and the optimal temperature to discriminate the SBS could be successfully found using the present method. PMID:21845077
On-chip Magnetic Separation and Cell Encapsulation in Droplets
NASA Astrophysics Data System (ADS)
Chen, A.; Byvank, T.; Bharde, A.; Miller, B. L.; Chalmers, J. J.; Sooryakumar, R.; Chang, W.-J.; Bashir, R.
2012-02-01
The demand for high-throughput single cell assays is gaining importance because of the heterogeneity of many cell suspensions, even after significant initial sorting. These suspensions may display cell-to-cell variability at the gene expression level that could impact single cell functional genomics, cancer, stem-cell research and drug screening. The on-chip monitoring of individual cells in an isolated environment could prevent cross-contamination, provide high recovery yield and ability to study biological traits at a single cell level These advantages of on-chip biological experiments contrast to conventional methods, which require bulk samples that provide only averaged information on cell metabolism. We report on a device that integrates microfluidic technology with a magnetic tweezers array to combine the functionality of separation and encapsulation of objects such as immunomagnetically labeled cells or magnetic beads into pico-liter droplets on the same chip. The ability to control the separation throughput that is independent of the hydrodynamic droplet generation rate allows the encapsulation efficiency to be optimized. The device can potentially be integrated with on-chip labeling and/or bio-detection to become a powerful single-cell analysis device.
Pernice, W.H.P.; Schuck, C.; Minaeva, O.; Li, M.; Goltsman, G.N.; Sergienko, A.V.; Tang, H.X.
2012-01-01
Ultrafast, high-efficiency single-photon detectors are among the most sought-after elements in modern quantum optics and quantum communication. However, imperfect modal matching and finite photon absorption rates have usually limited their maximum attainable detection efficiency. Here we demonstrate superconducting nanowire detectors atop nanophotonic waveguides, which enable a drastic increase of the absorption length for incoming photons. This allows us to achieve high on-chip single-photon detection efficiency up to 91% at telecom wavelengths, repeatable across several fabricated chips. We also observe remarkably low dark count rates without significant compromise of the on-chip detection efficiency. The detectors are fully embedded in scalable silicon photonic circuits and provide ultrashort timing jitter of 18 ps. Exploiting this high temporal resolution, we demonstrate ballistic photon transport in silicon ring resonators. Our direct implementation of a high-performance single-photon detector on chip overcomes a major barrier in integrated quantum photonics. PMID:23271658
Research on single-chip microcomputer controlled rotating magnetic field mineralization model
NASA Astrophysics Data System (ADS)
Li, Yang; Qi, Yulin; Yang, Junxiao; Li, Na
2017-08-01
As one of the method of selecting ore, the magnetic separation method has the advantages of stable operation, simple process flow, high beneficiation efficiency and no chemical environment pollution. But the existing magnetic separator are more mechanical, the operation is not flexible, and can not change the magnetic field parameters according to the precision of the ore needed. Based on the existing magnetic separator is mechanical, the rotating magnetic field can be used for single chip microcomputer control as the research object, design and trial a rotating magnetic field processing prototype, and through the single-chip PWM pulse output to control the rotation of the magnetic field strength and rotating magnetic field speed. This method of using pure software to generate PWM pulse to control rotary magnetic field beneficiation, with higher flexibility, accuracy and lower cost, can give full play to the performance of single-chip.
Wu, Xue; Sengupta, Kaushik
2018-03-19
This paper demonstrates a methodology to miniaturize THz spectroscopes into a single silicon chip by eliminating traditional solid-state architectural components such as complex tunable THz and optical sources, nonlinear mixing and amplifiers. The proposed method achieves this by extracting incident THz spectral signatures from the surface of an on-chip antenna itself. The information is sensed through the spectrally-sensitive 2D distribution of the impressed current surface under the THz incident field. By converting the antenna from a single-port to a massively multi-port architecture with integrated electronics and deep subwavelength sensing, THz spectral estimation is converted into a linear estimation problem. We employ rigorous regression techniques and analysis to demonstrate a single silicon chip system operating at room temperature across 0.04-0.99 THz with 10 MHz accuracy in spectrum estimation of THz tones across the entire spectrum.
A VLSI chip set for real time vector quantization of image sequences
NASA Technical Reports Server (NTRS)
Baker, Richard L.
1989-01-01
The architecture and implementation of a VLSI chip set that vector quantizes (VQ) image sequences in real time is described. The chip set forms a programmable Single-Instruction, Multiple-Data (SIMD) machine which can implement various vector quantization encoding structures. Its VQ codebook may contain unlimited number of codevectors, N, having dimension up to K = 64. Under a weighted least squared error criterion, the engine locates at video rates the best code vector in full-searched or large tree searched VQ codebooks. The ability to manipulate tree structured codebooks, coupled with parallelism and pipelining, permits searches in as short as O (log N) cycles. A full codebook search results in O(N) performance, compared to O(KN) for a Single-Instruction, Single-Data (SISD) machine. With this VLSI chip set, an entire video code can be built on a single board that permits realtime experimentation with very large codebooks.
On-chip single photon filtering and multiplexing in hybrid quantum photonic circuits.
Elshaari, Ali W; Zadeh, Iman Esmaeil; Fognini, Andreas; Reimer, Michael E; Dalacu, Dan; Poole, Philip J; Zwiller, Val; Jöns, Klaus D
2017-08-30
Quantum light plays a pivotal role in modern science and future photonic applications. Since the advent of integrated quantum nanophotonics different material platforms based on III-V nanostructures-, colour centers-, and nonlinear waveguides as on-chip light sources have been investigated. Each platform has unique advantages and limitations; however, all implementations face major challenges with filtering of individual quantum states, scalable integration, deterministic multiplexing of selected quantum emitters, and on-chip excitation suppression. Here we overcome all of these challenges with a hybrid and scalable approach, where single III-V quantum emitters are positioned and deterministically integrated in a complementary metal-oxide-semiconductor-compatible photonic circuit. We demonstrate reconfigurable on-chip single-photon filtering and wavelength division multiplexing with a foot print one million times smaller than similar table-top approaches, while offering excitation suppression of more than 95 dB and efficient routing of single photons over a bandwidth of 40 nm. Our work marks an important step to harvest quantum optical technologies' full potential.Combining different integration platforms on the same chip is currently one of the main challenges for quantum technologies. Here, Elshaari et al. show III-V Quantum Dots embedded in nanowires operating in a CMOS compatible circuit, with controlled on-chip filtering and tunable routing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rengstl, U.; Schwartz, M.; Herzog, T.
2015-07-13
We present an on-chip beamsplitter operating on a single-photon level by means of a quasi-resonantly driven InGaAs/GaAs quantum dot. The single photons are guided by rib waveguides and split into two arms by an evanescent field coupler. Although the waveguides themselves support the fundamental TE and TM modes, the measured degree of polarization (∼90%) reveals the main excitation and propagation of the TE mode. We observe the preserved single-photon nature of a quasi-resonantly excited quantum dot by performing a cross-correlation measurement on the two output arms of the beamsplitter. Additionally, the same quantum dot is investigated under resonant excitation, wheremore » the same splitting ratio is observed. An autocorrelation measurement with an off-chip beamsplitter on a single output arm reveal the single-photon nature after evanescent coupling inside the on-chip splitter. Due to their robustness, adjustable splitting ratio, and their easy implementation, rib waveguide beamsplitters with embedded quantum dots provide a promising step towards fully integrated quantum circuits.« less
Zhuang, Leimeng; Khan, Muhammad Rezaul; Beeker, Willem; Leinse, Arne; Heideman, René; Roeloffzen, Chris
2012-11-19
We propose and demonstrate a novel wideband microwave photonic fractional Hilbert transformer implemented using a ring resonator-based optical all-pass filter. The full programmability of the ring resonator allows variable and arbitrary fractional order of the Hilbert transformer. The performance analysis in both frequency and time domain validates that the proposed implementation provides a good approximation to an ideal fractional Hilbert transformer. This is also experimentally verified by an electrical S21 response characterization performed on a waveguide realization of a ring resonator. The waveguide-based structure allows the proposed Hilbert transformer to be integrated together with other building blocks on a photonic integrated circuit to create various system-level functionalities for on-chip microwave photonic signal processors. As an example, a circuit consisting of a splitter and a ring resonator has been realized which can perform on-chip phase control of microwave signals generated by means of optical heterodyning, and simultaneous generation of in-phase and quadrature microwave signals for a wide frequency range. For these functionalities, this simple and on-chip solution is considered to be practical, particularly when operating together with a dual-frequency laser. To our best knowledge, this is the first-time on-chip demonstration where ring resonators are employed to perform phase control functionalities for optical generation of microwave signals by means of optical heterodyning.
EGR distribution and fluctuation probe based on CO2 measurements
Parks, II, James E.; Partridge, Jr., William P.; Yoo, Ji Hyung
2015-06-30
A diagnostic system having a laser, an EGR probe, a detector and a processor. The laser may be a swept-.lamda. laser having a sweep range including a significant CO.sub.2 feature and substantially zero absorption regions. The sweep range may extend from about 2.708 .mu.m to about 2.7085 .mu.m. The processor may determine CO.sub.2 concentration as a function of the detector output signal. The processor may normalize the output signal as a function of the zero absorption regions. The system may include a plurality of EGR probes receiving light from a single laser. The system may include a separate detector for each probe. Alternatively, the system may combine the light returning from the different probes into a composite beam that is measured by a single detector. A unique modulation characteristic may be introduced into each light beam before combination so that the processor can discriminate between them in the composite beam.
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA
NASA Astrophysics Data System (ADS)
Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei
2013-03-01
With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
NASA Astrophysics Data System (ADS)
Weber, Walter H.; Mair, H. Douglas; Jansen, Dion
2003-03-01
A suite of basic signal processors has been developed. These basic building blocks can be cascaded together to form more complex processors without the need for programming. The data structures between each of the processors are handled automatically. This allows a processor built for one purpose to be applied to any type of data such as images, waveform arrays and single values. The processors are part of Winspect Data Acquisition software. The new processors are fast enough to work on A-scan signals live while scanning. Their primary use is to extract features, reduce noise or to calculate material properties. The cascaded processors work equally well on live A-scan displays, live gated data or as a post-processing engine on saved data. Researchers are able to call their own MATLAB or C-code from anywhere within the processor structure. A built-in formula node processor that uses a simple algebraic editor may make external user programs unnecessary. This paper also discusses the problems associated with ad hoc software development and how graphical programming languages can tie up researchers writing software rather than designing experiments.
The artificial satellite observation chronograph controlled by single chip microcomputer.
NASA Astrophysics Data System (ADS)
Pan, Guangrong; Tan, Jufan; Ding, Yuanjun
1991-06-01
The instrument specifications, hardware structure, software design, and other characteristics of the chronograph mounting on a theodolite used for artificial satellite observation are presented. The instrument is a real time control system with a single chip microcomputer.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
2017-02-15
Maunz2 Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone...information processors have been demonstrated experimentally using superconducting circuits1–3, electrons in semiconductors4–6, trapped atoms and...qubit quantum information processor has been realized14, and single- qubit gates have demonstrated randomized benchmarking (RB) infidelities as low as 10
Cedar-a large scale multiprocessor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.; Kuck, D.; Lawrie, D.
1983-01-01
This paper presents an overview of Cedar, a large scale multiprocessor being designed at the University of Illinois. This machine is designed to accommodate several thousand high performance processors which are capable of working together on a single job, or they can be partitioned into groups of processors where each group of one or more processors can work on separate jobs. Various aspects of the machine are described including the control methodology, communication network, optimizing compiler and plans for construction. 13 references.
NASA Technical Reports Server (NTRS)
Chang, Chen J. (Inventor); Liaghati, Jr., Amir L. (Inventor); Liaghati, Mahsa L. (Inventor)
2018-01-01
Methods and apparatus are provided for telemetry processing using a telemetry processor. The telemetry processor can include a plurality of communications interfaces, a computer processor, and data storage. The telemetry processor can buffer sensor data by: receiving a frame of sensor data using a first communications interface and clock data using a second communications interface, receiving an end of frame signal using a third communications interface, and storing the received frame of sensor data in the data storage. After buffering the sensor data, the telemetry processor can generate an encapsulated data packet including a single encapsulated data packet header, the buffered sensor data, and identifiers identifying telemetry devices that provided the sensor data. A format of the encapsulated data packet can comply with a Consultative Committee for Space Data Systems (CCSDS) standard. The telemetry processor can send the encapsulated data packet using a fourth and a fifth communications interfaces.
Image processing for a tactile/vision substitution system using digital CNN.
Lin, Chien-Nan; Yu, Sung-Nien; Hu, Jin-Cheng
2006-01-01
In view of the parallel processing and easy implementation properties of CNN, we propose to use digital CNN as the image processor of a tactile/vision substitution system (TVSS). The digital CNN processor is used to execute the wavelet down-sampling filtering and the half-toning operations, aiming to extract important features from the images. A template combination method is used to embed the two image processing functions into a single CNN processor. The digital CNN processor is implemented on an intellectual property (IP) and is implemented on a XILINX VIRTEX II 2000 FPGA board. Experiments are designated to test the capability of the CNN processor in the recognition of characters and human subjects in different environments. The experiments demonstrates impressive results, which proves the proposed digital CNN processor a powerful component in the design of efficient tactile/vision substitution systems for the visually impaired people.
1992-11-01
November 1992 1992 INTERNATIONAL AEROSPACE AND GROUND CONFERENCE 6. Perfrming Orgnis.aten Code ON LIGHTNING AND STATIC ELECTRICITY - ADDENDUM 111...October 6-8 1992 Program and the Federal Aviation Administration 14. Sponsoring Agency Code Technical Center ACD-230 15. Supplementary Metes The NICG...area]. The program runs well on an IBM PC or compatible 386 with a math co-processor 387 chip and a VGA monitor. For this study, streamers were added
1981-10-01
The congre- gation was just enthralled by this. One lady in the balcony got the call that very morning, jumped over the balcony, and shouted ...now we have satellite photographs and those are so good that when you fully enhance them by computer you can actually tell how high the waves are out...massively parallel processor being built for NASA . It consists of an arrLy of 120 by 128 of these chips. There are 16,384 computers and, of course, one
NEW EPICS/RTEMS IOC BASED ON ALTERA SOC AT JEFFERSON LAB
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Jianxun; Seaton, Chad; Allison, Trent L.
A new EPICS/RTEMS IOC based on the Altera System-on-Chip (SoC) FPGA is being designed at Jefferson Lab. The Altera SoC FPGA integrates a dual ARM Cortex-A9 Hard Processor System (HPS) consisting of processor, peripherals and memory interfaces tied seamlessly with the FPGA fabric using a high-bandwidth interconnect backbone. The embedded Altera SoC IOC has features of remote network boot via U-Boot from SD card or QSPI Flash, 1Gig Ethernet, 1GB DDR3 SDRAM on HPS, UART serial ports, and ISA bus interface. RTEMS for the ARM processor BSP were built with CEXP shell, which will dynamically load the EPICS applications atmore » runtime. U-Boot is the primary bootloader to remotely load the kernel image into local memory from a DHCP/TFTP server over Ethernet, and automatically run RTEMS and EPICS. The first design of the SoC IOC will be compatible with Jefferson Lab’s current PC104 IOCs, which have been running in CEBAF 10 years. The next design would be mounting in a chassis and connected to a daughter card via standard HSMC connectors. This standard SoC IOC will become the next generation of low-level IOC for the accelerator controls at Jefferson Lab.« less
Burst-mode optical label processor with ultralow power consumption.
Ibrahim, Salah; Nakahara, Tatsushi; Ishikawa, Hiroshi; Takahashi, Ryo
2016-04-04
A novel label processor subsystem for 100-Gbps (25-Gbps × 4λs) burst-mode optical packets is developed, in which a highly energy-efficient method is pursued for extracting and interfacing the ultrafast packet-label to a CMOS-based processor where label recognition takes place. The method involves performing serial-to-parallel conversion for the label bits on a bit-by-bit basis by using an optoelectronic converter that is operated with a set of optical triggers generated in a burst-mode manner upon packet arrival. Here we present three key achievements that enabled a significant reduction in the total power consumption and latency of the whole subsystem; 1) based on a novel operation mechanism for providing amplification with bit-level selectivity, an optical trigger pulse generator, that consumes power for a very short duration upon packet arrival, is proposed and experimentally demonstrated, 2) the energy of optical triggers needed by the optoelectronic serial-to-parallel converter is reduced by utilizing a negative-polarity signal while employing an enhanced conversion scheme entitled the discharge-or-hold scheme, 3) the necessary optical trigger energy is further cut down by half by coupling the triggers through the chip's backside, whereas a novel lens-free packaging method is developed to enable a low-cost alignment process that works with simple visual observation.
Six-port optical switch for cluster-mesh photonic network-on-chip
NASA Astrophysics Data System (ADS)
Jia, Hao; Zhou, Ting; Zhao, Yunchou; Xia, Yuhao; Dai, Jincheng; Zhang, Lei; Ding, Jianfeng; Fu, Xin; Yang, Lin
2018-05-01
Photonic network-on-chip for high-performance multi-core processors has attracted substantial interest in recent years as it offers a systematic method to meet the demand of large bandwidth, low latency and low power dissipation. In this paper we demonstrate a non-blocking six-port optical switch for cluster-mesh photonic network-on-chip. The architecture is constructed by substituting three optical switching units of typical Spanke-Benes network to optical waveguide crossings. Compared with Spanke-Benes network, the number of optical switching units is reduced by 20%, while the connectivity of routing path is maintained. By this way the footprint and power consumption can be reduced at the expense of sacrificing the network latency performance in some cases. The device is realized by 12 thermally tuned silicon Mach-Zehnder optical switching units. Its theoretical spectral responses are evaluated by establishing a numerical model. The experimental spectral responses are also characterized, which indicates that the optical signal-to-noise ratios of the optical switch are larger than 13.5 dB in the wavelength range from 1525 nm to 1565 nm. Data transmission experiment with the data rate of 32 Gbps is implemented for each optical link.
Bioinspired architecture approach for a one-billion transistor smart CMOS camera chip
NASA Astrophysics Data System (ADS)
Fey, Dietmar; Komann, Marcus
2007-05-01
In the paper we present a massively parallel VLSI architecture for future smart CMOS camera chips with up to one billion transistors. To exploit efficiently the potential offered by future micro- or nanoelectronic devices traditional on central structures oriented parallel architectures based on MIMD or SIMD approaches will fail. They require too long and too many global interconnects for the distribution of code or the access to common memory. On the other hand nature developed self-organising and emergent principles to manage successfully complex structures based on lots of interacting simple elements. Therefore we developed a new as Marching Pixels denoted emergent computing paradigm based on a mixture of bio-inspired computing models like cellular automaton and artificial ants. In the paper we present different Marching Pixels algorithms and the corresponding VLSI array architecture. A detailed synthesis result for a 0.18 μm CMOS process shows that a 256×256 pixel image is processed in less than 10 ms assuming a moderate 100 MHz clock rate for the processor array. Future higher integration densities and a 3D chip stacking technology will allow the integration and processing of Mega pixels within the same time since our architecture is fully scalable.
Controlled and tunable polymer particles' production using a single microfluidic device
NASA Astrophysics Data System (ADS)
Amoyav, Benzion; Benny, Ofra
2018-04-01
Microfluidics technology offers a new platform to control liquids under flow in small volumes. The advantage of using small-scale reactions for droplet generation along with the capacity to control the preparation parameters, making microfluidic chips an attractive technology for optimizing encapsulation formulations. However, one of the drawback in this methodology is the ability to obtain a wide range of droplet sizes, from sub-micron to microns using a single chip design. In fact, typically, droplet chips are used for micron-dimension particles, while nanoparticles' synthesis requires complex chips design (i.e., microreactors and staggered herringbone micromixer). Here, we introduce the development of a highly tunable and controlled encapsulation technique, using two polymer compositions, for generating particles ranging from microns to nano-size using the same simple single microfluidic chip design. Poly(lactic-co-glycolic acid) (PLGA 50:50) or PLGA/polyethylene glycol polymeric particles were prepared with focused-flow chip, yielding monodisperse particle batches. We show that by varying flow rate, solvent, surfactant and polymer composition, we were able to optimize particles' size and decrease polydispersity index, using simple chip designs with no further related adjustments or costs. Utilizing this platform, which offers tight tuning of particle properties, could offer an important tool for formulation development and can potentially pave the way towards a better precision nanomedicine.
The development of the time-keeping clock with TS-1 single chip microcomputer.
NASA Astrophysics Data System (ADS)
Zhou, Jiguang; Li, Yongan
The authors have developed a time-keeping clock with Intel 8751 single chip microcomputer that has been successfully used in time-keeping station. The hard-soft ware design and performance of the clock are introduced.
Next Generation Instrumentation Bus Test Plan for Fibre Channel
1999-09-30
sample for port testing. The heart of the Fibre Xpress network cards is the tachyon chip from Hewlett Packard. The tachyon chip is basically a single...be to test the protocols. 1.5.1.1 Tachyon chip The user’s manual for the Tachyon controller chip identifies the following FC-AL specification
Single chip camera device having double sampling operation
NASA Technical Reports Server (NTRS)
Fossum, Eric R. (Inventor); Nixon, Robert (Inventor)
2002-01-01
A single chip camera device is formed on a single substrate including an image acquisition portion for control portion and the timing circuit formed on the substrate. The timing circuit also controls the photoreceptors in a double sampling mode in which are reset level is first read and then after an integration time a charged level is read.
Adjustment of multi-CCD-chip-color-camera heads
NASA Astrophysics Data System (ADS)
Guyenot, Volker; Tittelbach, Guenther; Palme, Martin
1999-09-01
The principle of beam-splitter-multi-chip cameras consists in splitting an image into differential multiple images of different spectral ranges and in distributing these onto separate black and white CCD-sensors. The resulting electrical signals from the chips are recombined to produce a high quality color picture on the monitor. Because this principle guarantees higher resolution and sensitivity in comparison to conventional single-chip camera heads, the greater effort is acceptable. Furthermore, multi-chip cameras obtain the compete spectral information for each individual object point while single-chip system must rely on interpolation. In a joint project, Fraunhofer IOF and STRACON GmbH and in future COBRA electronic GmbH develop methods for designing the optics and dichroitic mirror system of such prism color beam splitter devices. Additionally, techniques and equipment for the alignment and assembly of color beam splitter-multi-CCD-devices on the basis of gluing with UV-curable adhesives have been developed, too.
Design and Analysis of Scheduling Policies for Real-Time Computer Systems
1992-01-01
C. M. Krishna, "The Impact of Workload on the Reliability of Real-Time Processor Triads," to appear in Micro . Rel. [17] J.F. Kurose, "Performance... Processor Triads", to appear in Micro . Rel. "* J.F. Kurose. "Performance Analysis of Minimum Laxity Scheduling in Discrete Time Queue- ing Systems", to...exponentially distributed service times and deadlines. A similar model was developed for the ED policy for a single processor system under identical
Backend Control Processor for a Multi-Processor Relational Database Computer System.
1984-12-01
SCHOOL OF ENGI. UNCRSIFID MPONTIFF DEC 84 AFXT/GCS/ENG/84D-22 F/O 9/2 L ommhhhhmhhml mhhhommhhhhhm i-2 8 -- U0. 11111= Q. 2 111.8IIII- 1111111..6...THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology Air University In Partial Fulfillment of the...development of a Backend Multi-Processor Relational Database Computer System. This thesis addresses a single component of this system, the Backend Control
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions
2014-05-01
processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
Single bus star connected reluctance drive and method
Fahimi, Babak; Shamsi, Pourya
2016-05-10
A system and methods for operating a switched reluctance machine includes a controller, an inverter connected to the controller and to the switched reluctance machine, a hysteresis control connected to the controller and to the inverter, a set of sensors connected to the switched reluctance machine and to the controller, the switched reluctance machine further including a set of phases the controller further comprising a processor and a memory connected to the processor, wherein the processor programmed to execute a control process and a generation process.
Multipurpose silicon photonics signal processor core.
Pérez, Daniel; Gasulla, Ivana; Crudgington, Lee; Thomson, David J; Khokhar, Ali Z; Li, Ke; Cao, Wei; Mashanovich, Goran Z; Capmany, José
2017-09-21
Integrated photonics changes the scaling laws of information and communication systems offering architectural choices that combine photonics with electronics to optimize performance, power, footprint, and cost. Application-specific photonic integrated circuits, where particular circuits/chips are designed to optimally perform particular functionalities, require a considerable number of design and fabrication iterations leading to long development times. A different approach inspired by electronic Field Programmable Gate Arrays is the programmable photonic processor, where a common hardware implemented by a two-dimensional photonic waveguide mesh realizes different functionalities through programming. Here, we report the demonstration of such reconfigurable waveguide mesh in silicon. We demonstrate over 20 different functionalities with a simple seven hexagonal cell structure, which can be applied to different fields including communications, chemical and biomedical sensing, signal processing, multiprocessor networks, and quantum information systems. Our work is an important step toward this paradigm.Integrated optical circuits today are typically designed for a few special functionalities and require complex design and development procedures. Here, the authors demonstrate a reconfigurable but simple silicon waveguide mesh with different functionalities.
NASA Astrophysics Data System (ADS)
Langlois, Serge; Fouquet, Olivier; Gouy, Yann; Riant, David
2014-08-01
On-Board Computers (OBC) are more and more using integrated systems on-chip (SOC) that embed processors running from 50MHz up to several hundreds of MHz, and around which are plugged some dedicated communication controllers together with other Input/Output channels.For ground testing and On-Board SoftWare (OBSW) validation purpose, a representative simulation of these systems, faster than real-time and with cycle-true timing of execution, is not achieved with current purely software simulators.Since a few years some hybrid solutions where put in place ([1], [2]), including hardware in the loop so as to add accuracy and performance in the computer software simulation.This paper presents the results of the works engaged by Thales Alenia Space (TAS-F) at the end of 2010, that led to a validated HW simulator of the UT699 by mid- 2012 and that is now qualified and fully used in operational contexts.
Media processors using a new microsystem architecture designed for the Internet era
NASA Astrophysics Data System (ADS)
Wyland, David C.
1999-12-01
The demands of digital image processing, communications and multimedia applications are growing more rapidly than traditional design methods can fulfill them. Previously, only custom hardware designs could provide the performance required to meet the demands of these applications. However, hardware design has reached a crisis point. Hardware design can no longer deliver a product with the required performance and cost in a reasonable time for a reasonable risk. Software based designs running on conventional processors can deliver working designs in a reasonable time and with low risk but cannot meet the performance requirements. What is needed is a media processing approach that combines very high performance, a simple programming model, complete programmability, short time to market and scalability. The Universal Micro System (UMS) is a solution to these problems. The UMS is a completely programmable (including I/O) system on a chip that combines hardware performance with the fast time to market, low cost and low risk of software designs.
NASA Astrophysics Data System (ADS)
Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan
2010-07-01
This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer
NASA Technical Reports Server (NTRS)
Jones, Mark Howard
1987-01-01
A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
A Data Acquisition System Using Single-Chip Microcomputer
NASA Astrophysics Data System (ADS)
Yonyjiang, Dai; Jingkuan, Gao; Lin, Wan; Mingjia, Pi; Jingda, Nan
1989-12-01
A data acquisition system by single-chip microcomputer was designed. It is suitable to the future devlopment of the miniature tidar signal processing epuipment . The characteristics of frequecy response, SNR, D* and NEP of FM-CW CO2 coherent tidar were discussed.
Research in Computer Simulation of Integrated Circuits.
1983-07-31
mactore ftor eval -al-rto implementad am a single chip ae those s Lca we beoi ~~g 7he PT!2 software datatow macl*-re ihas nodes ’cr prr., incrastgly... chip grows, these tools are becoming increasingly importan The FTL2 system described in this paper is an interactive system for specifying concurrent...implemented on a single chip grows, theselools are becom- / r/ - --/ ing increasingly important. The FTL2 system described in this paper is an interactive
Design and implementation of projects with Xilinx Zynq FPGA: a practical case
NASA Astrophysics Data System (ADS)
Travaglini, R.; D'Antone, I.; Meneghini, S.; Rignanese, L.; Zuffa, M.
The main advantage when using FPGAs with embedded processors is the availability of additional several high-performance resources in the same physical device. Moreover, the FPGA programmability allows for connect custom peripherals. Xilinx have designed a programmable device named Zynq-7000 (simply called Zynq in the following), which integrates programmable logic (identical to the other Xilinx "serie 7" devices) with a System on Chip (SOC) based on two embedded ARM processors. Since both parts are deeply connected, the designers benefit from performance of hardware SOC and flexibility of programmability as well. In this paper a design developed by the Electronic Design Department at the Bologna Division of INFN will be presented as a practical case of project based on Zynq device. It is developed by using a commercial board called ZedBoard hosting a FMC mezzanine with a 12-bit 500 MS/s ADC. The Zynq FPGA on the ZedBoard receives digital outputs from the ADC and send them to the acquisition PC, after proper formatting, through a Gigabit Ethernet link. The major focus of the paper will be about the methodology to develop a Zynq-based design with the Xilinx Vivado software, enlightening how to configure the SOC and connect it with the programmable logic. Firmware design techniques will be presented: in particular both VHDL and IP core based strategies will be discussed. Further, the procedure to develop software for the embedded processor will be presented. Finally, some debugging tools, like the embedded Logic Analyzer, will be shown. Advantages and disadvantages with respect to adopting FPGA without embedded processors will be discussed.
[The joint applications of DNA chips and single nucleotide polymorphisms in forensic science].
Bai, Peng; Tian, Li; Zhou, Xue-ping
2005-05-01
DNA chip technology, being a new high-technology, shows its vigorous life and rapid growth. Single Nucleotide Polymorphisms (SNPs) is the most common diversity in the human genome. It provides suitable genetic markers which play a key role in disease linkage study, pharmacogenomics, forensic medicine, population evolution and immigration study. Their advantage such as being analyzed with DNA chips technology, is predicted to play an important role in the field of forensic medicine, especially in paternity test and individual identification. This report mainly reviews the characteristics of DNA chip and SNPs, and their joint applications in the practice of forensic medicine.
NASA Tech Briefs, December 2011
NASA Technical Reports Server (NTRS)
2011-01-01
Topics covered include: 1) SNE Industrial Fieldbus Interface; 2) Composite Thermal Switch; 3) XMOS XC-2 Development Board for Mechanical Control and Data Collection; 4) Receiver Gain Modulation Circuit; 5) NEXUS Scalable and Distributed Next-Generation Avionics Bus for Space Missions; 6) Digital Interface Board to Control Phase and Amplitude of Four Channels; 7) CoNNeCT Baseband Processor Module; 8) Cryogenic 160-GHz MMIC Heterodyne Receiver Module; 9) Ka-Band, Multi-Gigabit-Per-Second Transceiver; 10) All-Solid-State 2.45-to-2.78-THz Source; 11) Onboard Interferometric SAR Processor for the Ka-Band Radar Interferometer (KaRIn); 12) Space Environments Testbed; 13) High-Performance 3D Articulated Robot Display; 14) Athena; 15) In Situ Surface Characterization; 16) Ndarts; 17) Cryo-Etched Black Silicon for Use as Optical Black; 18) Advanced CO2 Removal and Reduction System; 19) Correcting Thermal Deformations in an Active Composite Reflector; 20) Umbilical Deployment Device; 21) Space Mirror Alignment System; 22) Thermionic Power Cell To Harness Heat Energies for Geothermal Applications; 23) Graph Theory Roots of Spatial Operators for Kinematics and Dynamics; 24) Spacesuit Soft Upper Torso Sizing Systems; 25) Radiation Protection Using Single-Wall Carbon Nanotube Derivatives; 26) PMA-PhyloChip DNA Microarray to Elucidate Viable Microbial Community Structure; 27) Lidar Luminance Quantizer; 28) Distributed Capacitive Sensor for Sample Mass Measurement; 29) Base Flow Model Validation; 30) Minimum Landing Error Powered-Descent Guidance for Planetary Missions; 31) Framework for Integrating Science Data Processing Algorithms Into Process Control Systems; 32) Time Synchronization and Distribution Mechanisms for Space Networks; 33) Local Estimators for Spacecraft Formation Flying; 34) Software-Defined Radio for Space-to-Space Communications; 35) Reflective Occultation Mask for Evaluation of Occulter Designs for Planet Finding; and 36) Molecular Adsorber Coating
Development of Software for a Lidar-Altimeter Processor
NASA Technical Reports Server (NTRS)
Rosenberg, Jacob S.; Trujillo, Carlos
2005-01-01
A report describes the development of software for a digital processor that operates in conjunction with a finite-impulse-response (FIR) chip in a spaceborne lidar altimeter. Processing is started by a laser-fire interrupt signal that is repeated at intervals of 25 ms. For the purpose of discriminating between returns from the ground and returns from such things as trees, buildings, and clouds, the software is required to scan digitized lidar-return data in reverse of the acquisition sequence in order to distinguish the last return pulse from within a commanded ground-return range window. The digitized waveform information within this range window is filtered through 6 matched filters, in the hardware electronics, in order to maximize the probability of finding echoes from sloped or rough terrain and minimize the probability of selecting cloud returns. From the data falling past the end of the range window, there is obtained a noise baseline that is used to calculate a threshold value for each filter. The data from each filter is analyzed by a complex weighting scheme and the filter with the greatest weight is selected. A region around the peak of the ground return pulse associated with the selected filter is placed in telemetry, as well as information on its location, height, and other characteristics. The software requires many uplinked parameters as input. Included in the report is a discussion of major software-development problems posed by the design of the FIR chip and the need for the software to complete its process within 20 ms to fit within the overall 25-ms cycle.
Layout finishing of a 28nm, 3 billions transistors, multi-core processor
NASA Astrophysics Data System (ADS)
Morey-Chaisemartin, Philippe; Beisser, Eric
2013-06-01
Designing a fully new 256 cores processor is a great challenge for a fabless startup. In addition to all architecture, functionalities and timing issues, the layout by itself is a bottleneck due to all the process constraints of a 28nm technology. As developers of advanced layout finishing solutions, we were involved in the design flow of this huge chip with its 3 billions transistors. We had to face the issue of dummy patterns instantiation with respect to design constraints. All the design rules to generate the "dummies" are clearly defined in the Design Rule Manual, and some automatic procedures are provided by the foundry itself, but these routines don't take care of the designer requests. Such a chip, embeds both digital parts and analog modules for clock and power management. These two different type of designs have each their own set of constraints. In both cases, the insertion of dummies should not introduce unexpected variations leading to malfunctions. For example, on digital parts were signal race conditions are critical on long wires or bus, introduction of uncontrolled parasitic along these nets are highly critical. For analog devices such as high frequency and high sensitivity comparators, the exact symmetry of the two parts of a current mirror generator should be guaranteed. Thanks to the easily customizable features of our dummies insertion tool, we were able to configure it in order to meet all the designer requirements as well as the process constraints. This paper will present all these advanced key features as well as the layout tricks used to fulfill all requirements.
Two-step single slope/SAR ADC with error correction for CMOS image sensor.
Tang, Fang; Bermak, Amine; Amira, Abbes; Amor Benammar, Mohieddine; He, Debiao; Zhao, Xiaojin
2014-01-01
Conventional two-step ADC for CMOS image sensor requires full resolution noise performance in the first stage single slope ADC, leading to high power consumption and large chip area. This paper presents an 11-bit two-step single slope/successive approximation register (SAR) ADC scheme for CMOS image sensor applications. The first stage single slope ADC generates a 3-bit data and 1 redundant bit. The redundant bit is combined with the following 8-bit SAR ADC output code using a proposed error correction algorithm. Instead of requiring full resolution noise performance, the first stage single slope circuit of the proposed ADC can tolerate up to 3.125% quantization noise. With the proposed error correction mechanism, the power consumption and chip area of the single slope ADC are significantly reduced. The prototype ADC is fabricated using 0.18 μ m CMOS technology. The chip area of the proposed ADC is 7 μ m × 500 μ m. The measurement results show that the energy efficiency figure-of-merit (FOM) of the proposed ADC core is only 125 pJ/sample under 1.4 V power supply and the chip area efficiency is 84 k μ m(2) · cycles/sample.
A High-Voltage SOI CMOS Exciter Chip for a Programmable Fluidic Processor System.
Current, K W; Yuk, K; McConaghy, C; Gascoyne, P R C; Schwartz, J A; Vykoukal, J V; Andrews, C
2007-06-01
A high-voltage (HV) integrated circuit has been demonstrated to transport fluidic droplet samples on programmable paths across the array of driving electrodes on its hydrophobically coated surface. This exciter chip is the engine for dielectrophoresis (DEP)-based micro-fluidic lab-on-a-chip systems, creating field excitations that inject and move fluidic droplets onto and about the manipulation surface. The architecture of this chip is expandable to arrays of N X N identical HV electrode driver circuits and electrodes. The exciter chip is programmable in several senses. The routes of multiple droplets may be set arbitrarily within the bounds of the electrode array. The electrode excitation waveform voltage amplitude, phase, and frequency may be adjusted based on the system configuration and the signal required to manipulate a particular fluid droplet composition. The voltage amplitude of the electrode excitation waveform can be set from the minimum logic level up to the maximum limit of the breakdown voltage of the fabrication technology. The frequency of the electrode excitation waveform can also be set independently of its voltage, up to a maximum depending upon the type of droplets that must be driven. The exciter chip can be coated and its oxide surface used as the droplet manipulation surface or it can be used with a top-mounted, enclosed fluidic chamber consisting of a variety of materials. The HV capability of the exciter chip allows the generated DEP forces to penetrate into the enclosed chamber region and an adjustable voltage amplitude can accommodate a variety of chamber floor thicknesses. This demonstration exciter chip has a 32 x 32 array of nominally 100 V electrode drivers that are individually programmable at each time point in the procedure to either of two phases: 0deg and 180deg with respect to the reference clock. For this demonstration chip, while operating the electrodes with a 100-V peak-to-peak periodic waveform, the maximum HV electrode waveform frequency is about 200 Hz; and standard 5-V CMOS logic data communication rate is variable up to 250 kHz. This HV demonstration chip is fabricated in a 130-V 1.0-mum SOI CMOS fabrication technology, dissipates a maximum of 1.87 W, and is about 10.4 mm x 8.2 mm.
Mechanically verified hardware implementing an 8-bit parallel IO Byzantine agreement processor
NASA Technical Reports Server (NTRS)
Moore, J. Strother
1992-01-01
Consider a network of four processors that use the Oral Messages (Byzantine Generals) Algorithm of Pease, Shostak, and Lamport to achieve agreement in the presence of faults. Bevier and Young have published a functional description of a single processor that, when interconnected appropriately with three identical others, implements this network under the assumption that the four processors step in synchrony. By formalizing the original Pease, et al work, Bevier and Young mechanically proved that such a network achieves fault tolerance. We develop, formalize, and discuss a hardware design that has been mechanically proven to implement their processor. In particular, we formally define mapping functions from the abstract state space of the Bevier-Young processor to a concrete state space of a hardware module and state a theorem that expresses the claim that the hardware correctly implements the processor. We briefly discuss the Brock-Hunt Formal Hardware Description Language which permits designs both to be proved correct with the Boyer-Moore theorem prover and to be expressed in a commercially supported hardware description language for additional electrical analysis and layout. We briefly describe our implementation.
Wang, Han; Chen, Beibei; He, Man; Hu, Bin
2017-05-02
Single cell analysis is a significant research field in recent years reflecting the heterogeneity of cells in a biological system. In this work, a facile droplet chip was fabricated and online combined with time-resolved inductively coupled plasma mass spectrometry (ICPMS) via a microflow nebulizer for the determination of zinc in single HepG2 cells. On the focusing geometric designed PDMS microfluidic chip, the aqueous cell suspension was ejected and divided by hexanol to generate droplets. The droplets encapsulated single cells remain intact during the transportation into ICP for subsequent detection. Under the optimized conditions, the frequency of droplet generation is 3-6 × 10 6 min -1 , and the injected cell number is 2500 min -1 , which can ensure the single cell encapsulation. ZnO nanoparticles (NPs) were used for the quantification of zinc in single cells, and the accuracy was validated by conventional acid digestion-ICPMS method. The ZnO NPs incubated HepG2 cells were analyzed as model samples, and the results exhibit the heterogeneity of HepG2 cells in the uptake/adsorption of ZnO NPs. The developed online droplet-chip-ICPMS analysis system achieves stable single cell encapsulation and has high throughput for single cell analysis. It has the potential in monitoring the content as well as distribution of trace elements/NPs at the single cell level.
Optimal partitioning of random programs across two processors
NASA Technical Reports Server (NTRS)
Nicol, D. M.
1986-01-01
The optimal partitioning of random distributed programs is discussed. It is concluded that the optimal partitioning of a homogeneous random program over a homogeneous distributed system either assigns all modules to a single processor, or distributes the modules as evenly as possible among all processors. The analysis rests heavily on the approximation which equates the expected maximum of a set of independent random variables with the set's maximum expectation. The results are strengthened by providing an approximation-free proof of this result for two processors under general conditions on the module execution time distribution. It is also shown that use of this approximation causes two of the previous central results to be false.
Multitasking for flows about multiple body configurations using the chimera grid scheme
NASA Technical Reports Server (NTRS)
Dougherty, F. C.; Morgan, R. L.
1987-01-01
The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.
NASA Astrophysics Data System (ADS)
Liu, Fenglai; Kong, Jing
2018-07-01
Unique technical challenges and their solutions for implementing semi-numerical Hartree-Fock exchange on the Phil Processor are discussed, especially concerning the single- instruction-multiple-data type of processing and small cache size. Benchmark calculations on a series of buckyball molecules with various Gaussian basis sets on a Phi processor and a six-core CPU show that the Phi processor provides as much as 12 times of speedup with large basis sets compared with the conventional four-center electron repulsion integration approach performed on the CPU. The accuracy of the semi-numerical scheme is also evaluated and found to be comparable to that of the resolution-of-identity approach.
Advanced satellite communication system
NASA Technical Reports Server (NTRS)
Staples, Edward J.; Lie, Sen
1992-01-01
The objective of this research program was to develop an innovative advanced satellite receiver/demodulator utilizing surface acoustic wave (SAW) chirp transform processor and coherent BPSK demodulation. The algorithm of this SAW chirp Fourier transformer is of the Convolve - Multiply - Convolve (CMC) type, utilizing off-the-shelf reflective array compressor (RAC) chirp filters. This satellite receiver, if fully developed, was intended to be used as an on-board multichannel communications repeater. The Advanced Communications Receiver consists of four units: (1) CMC processor, (2) single sideband modulator, (3) demodulator, and (4) chirp waveform generator and individual channel processors. The input signal is composed of multiple user transmission frequencies operating independently from remotely located ground terminals. This signal is Fourier transformed by the CMC Processor into a unique time slot for each user frequency. The CMC processor is driven by a waveform generator through a single sideband (SSB) modulator. The output of the coherent demodulator is composed of positive and negative pulses, which are the envelopes of the chirp transform processor output. These pulses correspond to the data symbols. Following the demodulator, a logic circuit reconstructs the pulses into data, which are subsequently differentially decoded to form the transmitted data. The coherent demodulation and detection of BPSK signals derived from a CMC chirp transform processor were experimentally demonstrated and bit error rate (BER) testing was performed. To assess the feasibility of such advanced receiver, the results were compared with the theoretical analysis and plotted for an average BER as a function of signal-to-noise ratio. Another goal of this SBIR program was the development of a commercial product. The commercial product developed was an arbitrary waveform generator. The successful sales have begun with the delivery of the first arbitrary waveform generator.
Digital system for structural dynamics simulation
NASA Technical Reports Server (NTRS)
Krauter, A. I.; Lagace, L. J.; Wojnar, M. K.; Glor, C.
1982-01-01
State-of-the-art digital hardware and software for the simulation of complex structural dynamic interactions, such as those which occur in rotating structures (engine systems). System were incorporated in a designed to use an array of processors in which the computation for each physical subelement or functional subsystem would be assigned to a single specific processor in the simulator. These node processors are microprogrammed bit-slice microcomputers which function autonomously and can communicate with each other and a central control minicomputer over parallel digital lines. Inter-processor nearest neighbor communications busses pass the constants which represent physical constraints and boundary conditions. The node processors are connected to the six nearest neighbor node processors to simulate the actual physical interface of real substructures. Computer generated finite element mesh and force models can be developed with the aid of the central control minicomputer. The control computer also oversees the animation of a graphics display system, disk-based mass storage along with the individual processing elements.
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
European Seminar on Neural Computing
1988-08-31
elements can be fabricated on a single chip . Two specific oriented language (for example, SMALLTALK or cellular arrays, namely, the programmable systolic... chip POOL) the basic concepts are: objects are viewed as (Fisher, 1983) and the connection machine (Treleaven, active, they may contain state, and...flow computer the availability of 1. Programmable Systolic Chip . Programmable Sys- input operands triggers the execution of the instruction tolic Chips
Multibeam single frequency synthetic aperture radar processor for imaging separate range swaths
NASA Technical Reports Server (NTRS)
Jain, A. (Inventor)
1982-01-01
A single-frequency multibeam synthetic aperture radar for large swath imaging is disclosed. Each beam illuminates a separate ""footprint'' (i.e., range and azimuth interval). The distinct azimuth intervals for the separate beams produce a distinct Doppler frequency spectrum for each beam. After range correlation of raw data, an optical processor develops image data for the different beams by spatially separating the beams to place each beam of different Doppler frequency spectrum in a different location in the frequency plane as well as the imaging plane of the optical processor. Selection of a beam for imaging may be made in the frequency plane by adjusting the position of an aperture, or in the image plane by adjusting the position of a slit. The raw data may also be processed in digital form in an analogous manner.
Programmable lab-on-a-chip system for single cell analysis
NASA Astrophysics Data System (ADS)
Thalhammer, S.
2009-05-01
The collection, selection, amplification and detection of minimum genetic samples became a part of everyday life in medical and biological laboratories, to analyze DNA-fragments of pathogens, patient samples and traces on crime scenes. About a decade ago, a handful of researchers began discussing an intriguing idea. Could the equipment needed for everyday chemistry and biology procedures be shrunk to fit on a chip in the size of a fingernail? Miniature devices for, say, analysing DNA and proteins should be faster and cheaper than conventional versions. Lab-on-a-chip is an advanced technology that integrates a microfluidic system on a microscale chip device. The "laboratory" is created by means of channels, mixers, reservoirs, diffusion chambers, integrated electrodes, pumps, valves and more. With lab-ona- chip technology, complete laboratories on a square centimetre can be created. Here, a multifunctional programmable Lab-on-a-Chip driven by nanofluidics and controlled by surface acoustic waves (SAW) is presented. This system combines serial DNA-isolation-, amplification- and array-detection-process on a modified glass-platform. The fluid actuation is controlled via SAW by interdigital transducers implemented in the chemical modified chip surface. The chemical surface modification allows fluid handling in the sub-microliter range. Minute amount of sample material is extracted by laser-based microdissection out of e.g. histological sections at the single cell level. A few picogram of genetic material are isolated and transferred via a low-pressure transfer system (SPATS) onto the chip. Subsequently the genetic material inside single droplets, which behave like "virtual" beaker, is transported to the reaction and analysis centers on the chip surface via surface acoustic waves, mainly known as noise dumping filters in mobile phones. At these "biological reactors" the genetic material is processed, e.g. amplified via polymerase chain reaction methods, and genetically characterized.
NASA Astrophysics Data System (ADS)
Li, Yu; Li, Jiachen; Yu, Hongchen; Yu, Hai; Chen, Hongwei; Yang, Sigang; Chen, Minghua
2018-04-01
The explosive growth of data centers, cloud computing and various smart devices is limited by the current state of microelectronics, both in terms of speed and heat generation. Benefiting from the large bandwidth, promising low power consumption and passive calculation capability, experts believe that the integrated photonics-based signal processing and transmission technologies can break the bottleneck of microelectronics technology. In recent years, integrated photonics has become increasingly reliable and access to the advanced fabrication process has been offered by various foundries. In this paper, we review our recent works on the integrated optical signal processing system. We study three different kinds of on-chip signal processors and use these devices to build microsystems for the fields of microwave photonics, optical communications and spectrum sensing. The microwave photonics front receiver was demonstrated with a signal processing range of a full-band (L-band to W-band). A fully integrated microwave photonics transceiver without the on-chip laser was realized on silicon photonics covering the signal frequency of up 10 GHz. An all-optical orthogonal frequency division multiplexing (OFDM) de-multiplier was also demonstrated and used for an OFDM communication system with the rate of 64 Gbps. Finally, we show our work on the monolithic integrated spectrometer with a high resolution of about 20 pm at the central wavelength of 1550 nm. These proposed on-chip signal processing systems potential applications in the fields of radar, 5G wireless communication, wearable devices and optical access networks.
Missileborne Artificial Vision System (MAVIS)
NASA Technical Reports Server (NTRS)
Andes, David K.; Witham, James C.; Miles, Michael D.
1994-01-01
Several years ago when INTEL and China Lake designed the ETANN chip, analog VLSI appeared to be the only way to do high density neural computing. In the last five years, however, digital parallel processing chips capable of performing neural computation functions have evolved to the point of rough equality with analog chips in system level computational density. The Naval Air Warfare Center, China Lake, has developed a real time, hardware and software system designed to implement and evaluate biologically inspired retinal and cortical models. The hardware is based on the Adaptive Solutions Inc. massively parallel CNAPS system COHO boards. Each COHO board is a standard size 6U VME card featuring 256 fixed point, RISC processors running at 20 MHz in a SIMD configuration. Each COHO board has a companion board built to support a real time VSB interface to an imaging seeker, a NTSC camera, and to other COHO boards. The system is designed to have multiple SIMD machines each performing different corticomorphic functions. The system level software has been developed which allows a high level description of corticomorphic structures to be translated into the native microcode of the CNAPS chips. Corticomorphic structures are those neural structures with a form similar to that of the retina, the lateral geniculate nucleus, or the visual cortex. This real time hardware system is designed to be shrunk into a volume compatible with air launched tactical missiles. Initial versions of the software and hardware have been completed and are in the early stages of integration with a missile seeker.
Concurrent heterogeneous neural model simulation on real-time neuromimetic hardware.
Rast, Alexander; Galluppi, Francesco; Davies, Sergio; Plana, Luis; Patterson, Cameron; Sharp, Thomas; Lester, David; Furber, Steve
2011-11-01
Dedicated hardware is becoming increasingly essential to simulate emerging very-large-scale neural models. Equally, however, it needs to be able to support multiple models of the neural dynamics, possibly operating simultaneously within the same system. This may be necessary either to simulate large models with heterogeneous neural types, or to simplify simulation and analysis of detailed, complex models in a large simulation by isolating the new model to a small subpopulation of a larger overall network. The SpiNNaker neuromimetic chip is a dedicated neural processor able to support such heterogeneous simulations. Implementing these models on-chip uses an integrated library-based tool chain incorporating the emerging PyNN interface that allows a modeller to input a high-level description and use an automated process to generate an on-chip simulation. Simulations using both LIF and Izhikevich models demonstrate the ability of the SpiNNaker system to generate and simulate heterogeneous networks on-chip, while illustrating, through the network-scale effects of wavefront synchronisation and burst gating, methods that can provide effective behavioural abstractions for large-scale hardware modelling. SpiNNaker's asynchronous virtual architecture permits greater scope for model exploration, with scalable levels of functional and temporal abstraction, than conventional (or neuromorphic) computing platforms. The complete system illustrates a potential path to understanding the neural model of computation, by building (and breaking) neural models at various scales, connecting the blocks, then comparing them against the biology: computational cognitive neuroscience. Copyright © 2011 Elsevier Ltd. All rights reserved.
Scheduling time-critical graphics on multiple processors
NASA Technical Reports Server (NTRS)
Meyer, Tom W.; Hughes, John F.
1995-01-01
This paper describes an algorithm for the scheduling of time-critical rendering and computation tasks on single- and multiple-processor architectures, with minimal pipelining. It was developed to manage scientific visualization scenes consisting of hundreds of objects, each of which can be computed and displayed at thousands of possible resolution levels. The algorithm generates the time-critical schedule using progressive-refinement techniques; it always returns a feasible schedule and, when allowed to run to completion, produces a near-optimal schedule which takes advantage of almost the entire multiple-processor system.
GPU-based Parallel Application Design for Emerging Mobile Devices
NASA Astrophysics Data System (ADS)
Gupta, Kshitij
A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Implementation of pulse-coupled neural networks in a CNAPS environment.
Kinser, J M; Lindblad, T
1999-01-01
Pulse coupled neural networks (PCNN's) are biologically inspired algorithms very well suited for image/signal preprocessing. While several analog implementations are proposed we suggest a digital implementation in an existing environment, the connected network of adapted processors system (CNAPS). The reason for this is two fold. First, CNAPS is a commercially available chip which has been used for several neural-network implementations. Second, the PCNN is, in almost all applications, a very efficient component of a system requiring subsequent and additional processing. This may include gating, Fourier transforms, neural classifiers, data mining, etc, with or without feedback to the PCNN.
Advanced flight computer. Special study
NASA Technical Reports Server (NTRS)
Coo, Dennis
1995-01-01
This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.
GRAPE-4: A special-purpose computer for gravitational N-body problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Makino, Junichiro; Taiji, Makoto; Ebisuzaki, Toshikazu
1995-12-01
We describe GRAPE-4, a special-purpose computer for gravitational N-body simulations. In gravitational N-body simulations, almost all computing time is spent for the calculation of interaction between particles. GRAPE-4 is a specialized hardware to calculate the interaction between particles. It is used with a general-purpose host computer that performs all calculations other than the force calculation. With this architecture, it is relatively easy to realize a massively parallel system. In 1991, we developed the GRAPE-3 system with the peak speed equivalent to 14.4 Gflops. It consists of 48 custom pipelined processors. In 1992 we started the development of GRAPE-4. The GRAPE-4more » system will consist of 1920 custom pipeline chips. Each chip has the speed of 600 Mflops, when operated on 30 MHz clock. A prototype system with two custom LSIs has been completed July 1994, and the full system is now under manufacturing.« less
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer
NASA Astrophysics Data System (ADS)
Xiao, Hao; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki; Nakase, Yuko; Kimura, Sadahiro
Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
Kim, Jungkyu; Jensen, Erik C; Stockton, Amanda M; Mathies, Richard A
2013-08-20
A fully integrated multilayer microfluidic chemical analyzer for automated sample processing and labeling, as well as analysis using capillary zone electrophoresis is developed and characterized. Using lifting gate microfluidic control valve technology, a microfluidic automaton consisting of a two-dimensional microvalve cellular array is fabricated with soft lithography in a format that enables facile integration with a microfluidic capillary electrophoresis device. The programmable sample processor performs precise mixing, metering, and routing operations that can be combined to achieve automation of complex and diverse assay protocols. Sample labeling protocols for amino acid, aldehyde/ketone and carboxylic acid analysis are performed automatically followed by automated transfer and analysis by the integrated microfluidic capillary electrophoresis chip. Equivalent performance to off-chip sample processing is demonstrated for each compound class; the automated analysis resulted in a limit of detection of ~16 nM for amino acids. Our microfluidic automaton provides a fully automated, portable microfluidic analysis system capable of autonomous analysis of diverse compound classes in challenging environments.
An Efficient Solution Method for Multibody Systems with Loops Using Multiple Processors
NASA Technical Reports Server (NTRS)
Ghosh, Tushar K.; Nguyen, Luong A.; Quiocho, Leslie J.
2015-01-01
This paper describes a multibody dynamics algorithm formulated for parallel implementation on multiprocessor computing platforms using the divide-and-conquer approach. The system of interest is a general topology of rigid and elastic articulated bodies with or without loops. The algorithm divides the multibody system into a number of smaller sets of bodies in chain or tree structures, called "branches" at convenient joints called "connection points", and uses an Order-N (O (N)) approach to formulate the dynamics of each branch in terms of the unknown spatial connection forces. The equations of motion for the branches, leaving the connection forces as unknowns, are implemented in separate processors in parallel for computational efficiency, and the equations for all the unknown connection forces are synthesized and solved in one or several processors. The performances of two implementations of this divide-and-conquer algorithm in multiple processors are compared with an existing method implemented on a single processor.
Bhanot, Gyan [Princeton, NJ; Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2009-09-08
Class network routing is implemented in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof. Class network routing allows a compute processor to broadcast a message to a range (one or more) of other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a broadcast. Class network routing is also applied to dense matrix inversion algorithms on distributed memory parallel supercomputers with hardware class function (multicast) capability. This is achieved by exploiting the fact that the communication patterns of dense matrix inversion can be served by hardware class functions, which results in faster execution times.
Using an ARM Processor to boost data acquisition rates
NASA Astrophysics Data System (ADS)
Brown, Anthony; Seaquest Collaboration
2015-10-01
It has been proposed, Fermilab E-1067, to use the SeaQuest (E906/E1039/1037) dimuon spectrometer to do a search for the dark photon and dark Higgs. The concept is that it would run in a parasitic mode with only minor upgrades to the spectrometer. There are various requirements for the upgrades but one of them is to increase the DAQ rates and one minimal cost approach to do this will be discussed. The currently running SeaQuest (E906) experiment has modest rate requirements of around 1 kHz. Since the dark particle search would involve recording particles originating in the first magnet used as a beam dump, the data rate will be higher than recording events just from the target. Thus the DAQ rate capability will need to be increased to around 10 kHz. There exists a possible very low cost solution as the Academica Sinica designed TDCs contains an ARM processor that was not needed to meet the original SeaQuest (E906 needs). Since the 120 GeV beam from the Main Injector is delivered in a 4 second spill, once per minute and the ARM processor on the TDC has two dual-ported memory chips, these could be used to store data during each spill and then read the data out in the time between spills.
On-chip low loss heralded source of pure single photons.
Spring, Justin B; Salter, Patrick S; Metcalf, Benjamin J; Humphreys, Peter C; Moore, Merritt; Thomas-Peter, Nicholas; Barbieri, Marco; Jin, Xian-Min; Langford, Nathan K; Kolthammer, W Steven; Booth, Martin J; Walmsley, Ian A
2013-06-03
A key obstacle to the experimental realization of many photonic quantum-enhanced technologies is the lack of low-loss sources of single photons in pure quantum states. We demonstrate a promising solution: generation of heralded single photons in a silica photonic chip by spontaneous four-wave mixing. A heralding efficiency of 40%, corresponding to a preparation efficiency of 80% accounting for detector performance, is achieved due to efficient coupling of the low-loss source to optical fibers. A single photon purity of 0.86 is measured from the source number statistics without narrow spectral filtering, and confirmed by direct measurement of the joint spectral intensity. We calculate that similar high-heralded-purity output can be obtained from visible to telecom spectral regions using this approach. On-chip silica sources can have immediate application in a wide range of single-photon quantum optics applications which employ silica photonics.
On-Chip Single-Plasmon Nanocircuit Driven by a Self-Assembled Quantum Dot.
Wu, Xiaofei; Jiang, Ping; Razinskas, Gary; Huo, Yongheng; Zhang, Hongyi; Kamp, Martin; Rastelli, Armando; Schmidt, Oliver G; Hecht, Bert; Lindfors, Klas; Lippitz, Markus
2017-07-12
Quantum photonics holds great promise for future technologies such as secure communication, quantum computation, quantum simulation, and quantum metrology. An outstanding challenge for quantum photonics is to develop scalable miniature circuits that integrate single-photon sources, linear optical components, and detectors on a chip. Plasmonic nanocircuits will play essential roles in such developments. However, for quantum plasmonic circuits, integration of stable, bright, and narrow-band single photon sources in the structure has so far not been reported. Here we present a plasmonic nanocircuit driven by a self-assembled GaAs quantum dot. Through a planar dielectric-plasmonic hybrid waveguide, the quantum dot efficiently excites narrow-band single plasmons that are guided in a two-wire transmission line until they are converted into single photons by an optical antenna. Our work demonstrates the feasibility of fully on-chip plasmonic nanocircuits for quantum optical applications.
Huys, Roeland; Braeken, Dries; Jans, Danny; Stassen, Andim; Collaert, Nadine; Wouters, Jan; Loo, Josine; Severi, Simone; Vleugels, Frank; Callewaert, Geert; Verstreken, Kris; Bartic, Carmen; Eberle, Wolfgang
2012-04-07
To cope with the growing needs in research towards the understanding of cellular function and network dynamics, advanced micro-electrode arrays (MEAs) based on integrated complementary metal oxide semiconductor (CMOS) circuits have been increasingly reported. Although such arrays contain a large number of sensors for recording and/or stimulation, the size of the electrodes on these chips are often larger than a typical mammalian cell. Therefore, true single-cell recording and stimulation remains challenging. Single-cell resolution can be obtained by decreasing the size of the electrodes, which inherently increases the characteristic impedance and noise. Here, we present an array of 16,384 active sensors monolithically integrated on chip, realized in 0.18 μm CMOS technology for recording and stimulation of individual cells. Successful recording of electrical activity of cardiac cells with the chip, validated with intracellular whole-cell patch clamp recordings are presented, illustrating single-cell readout capability. Further, by applying a single-electrode stimulation protocol, we could pace individual cardiac cells, demonstrating single-cell addressability. This novel electrode array could help pave the way towards solving complex interactions of mammalian cellular networks. This journal is © The Royal Society of Chemistry 2012
Novel Robotic Tools for Piping Inspection and Repair
2015-01-14
was selected due to its small size, and peripheral capability. The SoM measures 50mm x 44mm. The SoM processor is an ARM Cortex -A8 running at720MHz...designing an embedded computing system from scratch. The SoM is a single integrated module which contains the processor , RAM, power management, and
Dynamic overset grid communication on distributed memory parallel processors
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.
1993-01-01
A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.
Multichannel photonic Hilbert transformers based on complex modulated integrated Bragg gratings.
Cheng, Rui; Chrostowski, Lukas
2018-03-01
Multichannel photonic Hilbert transformers (MPHTs) are reported. The devices are based on single compact spiral integrated Bragg gratings on silicon with coupling coefficients precisely modulated by the phase of each grating period. MPHTs with up to nine wavelength channels and a single-channel bandwidth of up to ∼625 GHz are achieved. The potential of the devices for multichannel single-sideband signal generation is suggested. The work offers a new possibility of utilizing wavelength as an extra degree of freedom in designing radio-frequency photonic signal processors. Such multichannel processors are expected to possess improved capacities and a potential to greatly benefit current widespread wavelength division multiplexed systems.
WFC3/UVIS External CTE Monitor: Single-Chip CTE Measurements
NASA Astrophysics Data System (ADS)
Gosmeyer, C. M.; Baggett, S.
2016-12-01
We present the first results of single-chip measurements of charge transfer efficiency (CTE) in the UVIS channel of the Hubble Space Telescope Wide Field Camera 3 (HST/WFC3). This test was performed in Cycle 20 in two visits. In the first visit a field in the star cluster NGC 6583 was observed. In a second visit, the telescope returned to the field, but rotated by 180 degrees and with a shift in pointing that allowed the same stars to be imaged, near and far from the amplifiers, on the same chip of the two-chip UVIS field of-view. This dataset enables a measurement of CTE loss on each separate chip. The current CTE monitor measures CTE loss as an average of the two chips because it dithers by a chip-height to obtain observations of the same sources near and far from the amplifiers, instead of the more difficult to-schedule 180-degree rotation. We find that CTE loss is worse on Chip 1 than on Chip 2 across all cases for which we had data: short and long exposures and w! ith and without the pixel-based CTE correction. In the best case, for long exposures with the CTE correction applied, the max difference between the two chip's flux losses is 3%/2048 pixels. This case should apply for most science observations where the background is 12 e-/pixel. In the worst case of low-background short exposures, e.g. those without post-flash, the max difference between the two chips is 17% flux loss/2048 pixels. Uncertainties are <0.01% flux loss/2048 pixels. Because of the two chips' different CTE loss rates, we will consider adding this test as part of the routine yearly monitor and creating a chip-specific CTE correction software.
NASA Technical Reports Server (NTRS)
Zhou, Zhimin (Inventor); Pain, Bedabrata (Inventor)
1999-01-01
An analog-to-digital converter for on-chip focal-plane image sensor applications. The analog-to-digital converter utilizes a single charge integrating amplifier in a charge balancing architecture to implement successive approximation analog-to-digital conversion. This design requires minimal chip area and has high speed and low power dissipation for operation in the 2-10 bit range. The invention is particularly well suited to CMOS on-chip applications requiring many analog-to-digital converters, such as column-parallel focal-plane architectures.
NASA Astrophysics Data System (ADS)
Pitris, St.; Vagionas, Ch.; Kanellos, G. T.; Kisacik, R.; Tekin, T.; Broeke, R.; Pleros, N.
2016-03-01
At the dawning of the exaflop era, High Performance Computers are foreseen to exploit integrated all-optical elements, to overcome the speed limitations imposed by electronic counterparts. Drawing from the well-known Memory Wall limitation, imposing a performance gap between processor and memory speeds, research has focused on developing ultra-fast latching devices and all-optical memory elements capable of delivering buffering and switching functionalities at unprecedented bit-rates. Following the master-slave configuration of electronic Flip-Flops, coupled SOA-MZI based switches have been theoretically investigated to exceed 40 Gb/s operation, provided a short coupling waveguide. However, this flip-flop architecture has been only hybridly integrated with silica-on-silicon integration technology exhibiting a total footprint of 45x12 mm2 and intra-Flip-Flop coupling waveguide of 2.5cm, limited at 5 Gb/s operation. Monolithic integration offers the possibility to fabricate multiple active and passive photonic components on a single chip at a close proximity towards, bearing promises for fast all-optical memories. Here, we present for the first time a monolithically integrated all-optical SR Flip-Flop with coupled master-slave SOA-MZI switches. The photonic chip is integrated on a 6x2 mm2 die as a part of a multi-project wafer run using library based components of a generic InP platform, fiber-pigtailed and fully packaged on a temperature controlled ceramic submount module with electrical contacts. The intra Flip-Flop coupling waveguide is 5 mm long, reducing the total footprint by two orders of magnitude. Successful flip flop functionality is evaluated at 10 Gb/s with clear open eye diagram, achieving error free operation with a power penalty of 4dB.
Adapting Wave-front Algorithms to Efficiently Utilize Systems with Deep Communication Hierarchies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kerbyson, Darren J.; Lang, Michael; Pakin, Scott
2011-09-30
Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance especially in hybrid systems using accelerators. Processorcores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contains wavefront processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundarymore » data downstream and whose cost is typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional steps in the parallel computation and higher use of on-chip communications. This tradeoff is explored using a performance model. An implementation using the Reverse-acceleration programming model on the petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in system communication performance exists.« less
Monitoring Temperature and Fan Speed Using Ganglia and Winbond Chips
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCaffrey, Cattie; /SLAC
2006-09-27
Effective monitoring is essential to keep a large group of machines, like the ones at Stanford Linear Accelerator Center (SLAC), up and running. SLAC currently uses Ganglia Monitoring System to observe about 2000 machines, analyzing metrics like CPU usage and I/O rate. However, metrics essential to machine hardware health, such as temperature and fan speed, are not being monitored. Many machines have a Winbond w83782d chip which monitors three temperatures, two of which come from dual CPUs, and returns the information when the sensor command is invoked. Ganglia also provides a feature, gmetric, that allows the users to monitor theirmore » own metrics and incorporate them into the monitoring system. The programming language Perl is chosen to implement a script that invokes the sensors command, extracts the temperature and fan speed information, and calls gmetric with the appropriate arguments. Two machines were used to test the script; the two CPUs on each machine run at about 65 Celsius, which is well within the operating temperature range (The maximum safe temperature range is 77-82 Celsius for the Pentium III processors being used). Installing the script on all machines with a Winbond w83782d chip allows the SLAC Scientific Computing and Computing Services group (SCCS) to better evaluate current cooling methods.« less
Single cell HaloChip assay on paper for point-of-care diagnosis.
Ma, Liyuan; Qiao, Yong; Jones, Ross; Singh, Narendra; Su, Ming
2016-11-01
This article describes a paper-based low cost single cell HaloChip assay that can be used to assess drug- and radiation-induced DNA damage at point-of-care. Printing ink on paper effectively blocks fluorescence of paper materials, provides high affinity to charged polyelectrolytes, and prevents penetration of water in paper. After exposure to drug or ionizing radiation, cells are patterned on paper to create discrete and ordered single cell arrays, embedded inside an agarose gel, lysed with alkaline solution to allow damaged DNA fragments to diffuse out of nucleus cores, and form diffusing halos in the gel matrix. After staining DNA with a fluorescent dye, characteristic halos formed around cells, and the level of DNA damage can be quantified by determining sizes of halos and nucleus with an image processing program based on MATLAB. With its low fabrication cost and easy operation, this HaloChip on paper platform will be attractive to rapidly and accurately determine DNA damage for point-of-care evaluation of drug efficacy and radiation condition. Graphical Abstract Single cell HaloChip on paper.
Guo, Baoshan; Lei, Cheng; Kobayashi, Hirofumi; Ito, Takuro; Yalikun, Yaxiaer; Jiang, Yiyue; Tanaka, Yo; Ozeki, Yasuyuki; Goda, Keisuke
2017-05-01
The development of reliable, sustainable, and economical sources of alternative fuels to petroleum is required to tackle the global energy crisis. One such alternative is microalgal biofuel, which is expected to play a key role in reducing the detrimental effects of global warming as microalgae absorb atmospheric CO 2 via photosynthesis. Unfortunately, conventional analytical methods only provide population-averaged lipid amounts and fail to characterize a diverse population of microalgal cells with single-cell resolution in a non-invasive and interference-free manner. Here high-throughput label-free single-cell screening of lipid-producing microalgal cells with optofluidic time-stretch quantitative phase microscopy was demonstrated. In particular, Euglena gracilis, an attractive microalgal species that produces wax esters (suitable for biodiesel and aviation fuel after refinement), within lipid droplets was investigated. The optofluidic time-stretch quantitative phase microscope is based on an integration of a hydrodynamic-focusing microfluidic chip, an optical time-stretch quantitative phase microscope, and a digital image processor equipped with machine learning. As a result, it provides both the opacity and phase maps of every single cell at a high throughput of 10,000 cells/s, enabling accurate cell classification without the need for fluorescent staining. Specifically, the dataset was used to characterize heterogeneous populations of E. gracilis cells under two different culture conditions (nitrogen-sufficient and nitrogen-deficient) and achieve the cell classification with an error rate of only 2.15%. The method holds promise as an effective analytical tool for microalgae-based biofuel production. © 2017 International Society for Advancement of Cytometry. © 2017 International Society for Advancement of Cytometry.
Chen, Zhiyuan; Law, Man-Kay; Mak, Pui-In; Martins, Rui P
2017-02-01
In this paper, an ultra-compact single-chip solar energy harvesting IC using on-chip solar cell for biomedical implant applications is presented. By employing an on-chip charge pump with parallel connected photodiodes, a 3.5 × efficiency improvement can be achieved when compared with the conventional stacked photodiode approach to boost the harvested voltage while preserving a single-chip solution. A photodiode-assisted dual startup circuit (PDSC) is also proposed to improve the area efficiency and increase the startup speed by 77%. By employing an auxiliary charge pump (AQP) using zero threshold voltage (ZVT) devices in parallel with the main charge pump, a low startup voltage of 0.25 V is obtained while minimizing the reversion loss. A 4 V in gate drive voltage is utilized to reduce the conduction loss. Systematic charge pump and solar cell area optimization is also introduced to improve the energy harvesting efficiency. The proposed system is implemented in a standard 0.18- [Formula: see text] CMOS technology and occupies an active area of 1.54 [Formula: see text]. Measurement results show that the on-chip charge pump can achieve a maximum efficiency of 67%. With an incident power of 1.22 [Formula: see text] from a halogen light source, the proposed energy harvesting IC can deliver an output power of 1.65 [Formula: see text] at 64% charge pump efficiency. The chip prototype is also verified using in-vitro experiment.
Mapping of H.264 decoding on a multiprocessor architecture
NASA Astrophysics Data System (ADS)
van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.
2003-05-01
Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the overall speedup.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krueger, Jens; Micikevicius, Paulius; Williams, Samuel
Reverse Time Migration (RTM) is one of the main approaches in the seismic processing industry for imaging the subsurface structure of the Earth. While RTM provides qualitative advantages over its predecessors, it has a high computational cost warranting implementation on HPC architectures. We focus on three progressively more complex kernels extracted from RTM: for isotropic (ISO), vertical transverse isotropic (VTI) and tilted transverse isotropic (TTI) media. In this work, we examine performance optimization of forward wave modeling, which describes the computational kernels used in RTM, on emerging multi- and manycore processors and introduce a novel common subexpression elimination optimization formore » TTI kernels. We compare attained performance and energy efficiency in both the single-node and distributed memory environments in order to satisfy industry’s demands for fidelity, performance, and energy efficiency. Moreover, we discuss the interplay between architecture (chip and system) and optimizations (both on-node computation) highlighting the importance of NUMA-aware approaches to MPI communication. Ultimately, our results show we can improve CPU energy efficiency by more than 10× on Magny Cours nodes while acceleration via multiple GPUs can surpass the energy-efficient Intel Sandy Bridge by as much as 3.6×.« less
Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2
NASA Technical Reports Server (NTRS)
Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.