efficient vlsi architecture: Topics by Science.gov

Sample records for efficient vlsi architecture

An efficient interpolation filter VLSI architecture for HEVC standard

NASA Astrophysics Data System (ADS)

Zhou, Wei; Zhou, Xin; Lian, Xiaocong; Liu, Zhenyu; Liu, Xiaoxiang

2015-12-01

The next-generation video coding standard of High-Efficiency Video Coding (HEVC) is especially efficient for coding high-resolution video such as 8K-ultra-high-definition (UHD) video. Fractional motion estimation in HEVC presents a significant challenge in clock latency and area cost as it consumes more than 40 % of the total encoding time and thus results in high computational complexity. With aims at supporting 8K-UHD video applications, an efficient interpolation filter VLSI architecture for HEVC is proposed in this paper. Firstly, a new interpolation filter algorithm based on the 8-pixel interpolation unit is proposed in this paper. It can save 19.7 % processing time on average with acceptable coding quality degradation. Based on the proposed algorithm, an efficient interpolation filter VLSI architecture, composed of a reused data path of interpolation, an efficient memory organization, and a reconfigurable pipeline interpolation filter engine, is presented to reduce the implement hardware area and achieve high throughput. The final VLSI implementation only requires 37.2k gates in a standard 90-nm CMOS technology at an operating frequency of 240 MHz. The proposed architecture can be reused for either half-pixel interpolation or quarter-pixel interpolation, which can reduce the area cost for about 131,040 bits RAM. The processing latency of our proposed VLSI architecture can support the real-time processing of 4:2:0 format 7680 × 4320@78fps video sequences.
A subthreshold aVLSI implementation of the Izhikevich simple neuron model.

PubMed

Rangan, Venkat; Ghosh, Abhishek; Aparin, Vladimir; Cauwenberghs, Gert

2010-01-01

We present a circuit architecture for compact analog VLSI implementation of the Izhikevich neuron model, which efficiently describes a wide variety of neuron spiking and bursting dynamics using two state variables and four adjustable parameters. Log-domain circuit design utilizing MOS transistors in subthreshold results in high energy efficiency, with less than 1pJ of energy consumed per spike. We also discuss the effects of parameter variations on the dynamics of the equations, and present simulation results that replicate several types of neural dynamics. The low power operation and compact analog VLSI realization make the architecture suitable for human-machine interface applications in neural prostheses and implantable bioelectronics, as well as large-scale neural emulation tools for computational neuroscience.
A single chip VLSI Reed-Solomon decoder

NASA Technical Reports Server (NTRS)

Shao, H. M.; Truong, T. K.; Hsu, I. S.; Deutsch, L. J.; Reed, I. S.

1986-01-01

A new VLSI design of a pipeline Reed-Solomon decoder is presented. The transform decoding technique used in a previous design is replaced by a time domain algorithm. A new architecture that implements such an algorithm permits efficient pipeline processing with minimum circuitry. A systolic array is also developed to perform erasure corrections in the new design. A modified form of Euclid's algorithm is implemented by a new architecture that maintains the throughput rate with less circuitry. Such improvements result in both enhanced capability and a significant reduction in silicon area, therefore making it possible to build a pipeline (31,15)RS decoder on a single VLSI chip.
High-throughput sample adaptive offset hardware architecture for high-efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Wei; Yan, Chang; Zhang, Jingzhi; Zhou, Xin

2018-03-01

A high-throughput hardware architecture for a sample adaptive offset (SAO) filter in the high-efficiency video coding video coding standard is presented. First, an implementation-friendly and simplified bitrate estimation method of rate-distortion cost calculation is proposed to reduce the computational complexity in the mode decision of SAO. Then, a high-throughput VLSI architecture for SAO is presented based on the proposed bitrate estimation method. Furthermore, multiparallel VLSI architecture for in-loop filters, which integrates both deblocking filter and SAO filter, is proposed. Six parallel strategies are applied in the proposed in-loop filters architecture to improve the system throughput and filtering speed. Experimental results show that the proposed in-loop filters architecture can achieve up to 48% higher throughput in comparison with prior work. The proposed architecture can reach a high-operating clock frequency of 297 MHz with TSMC 65-nm library and meet the real-time requirement of the in-loop filters for 8 K × 4 K video format at 132 fps.
Large-Constraint-Length, Fast Viterbi Decoder

NASA Technical Reports Server (NTRS)

Collins, O.; Dolinar, S.; Hsu, In-Shek; Pollara, F.; Olson, E.; Statman, J.; Zimmerman, G.

1990-01-01

Scheme for efficient interconnection makes VLSI design feasible. Concept for fast Viterbi decoder provides for processing of convolutional codes of constraint length K up to 15 and rates of 1/2 to 1/6. Fully parallel (but bit-serial) architecture developed for decoder of K = 7 implemented in single dedicated VLSI circuit chip. Contains six major functional blocks. VLSI circuits perform branch metric computations, add-compare-select operations, and then store decisions in traceback memory. Traceback processor reads appropriate memory locations and puts out decoded bits. Used as building block for decoders of larger K.
On the VLSI design of a pipeline Reed-Solomon decoder using systolic arrays

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, H.M.; Reed, I.S.

A new VLSI design of a pipeline Reed-Solomon decoder is presented. The transform decoding technique used in a previous paper is replaced by a time domain algorithm through a detailed comparison of their VLSI implementations. A new architecture that implements the time domain algorithm permits efficient pipeline processing with reduced circuitry. Erasure correction capability is also incorporated with little additional complexity. By using a multiplexing technique, a new implementation of Euclid's algorithm maintains the throughput rate with less circuitry. Such improvements result in both enhanced capability and significant reduction in silicon area, therefore making it possible to build a pipelinemore » Reed-Solomon decoder on a single VLSI chip.« less
On the VLSI design of a pipeline Reed-Solomon decoder using systolic arrays

NASA Technical Reports Server (NTRS)

Shao, H. M.; Deutsch, L. J.; Reed, I. S.

1987-01-01

A new very large scale integration (VLSI) design of a pipeline Reed-Solomon decoder is presented. The transform decoding technique used in a previous article is replaced by a time domain algorithm through a detailed comparison of their VLSI implementations. A new architecture that implements the time domain algorithm permits efficient pipeline processing with reduced circuitry. Erasure correction capability is also incorporated with little additional complexity. By using a multiplexing technique, a new implementation of Euclid's algorithm maintains the throughput rate with less circuitry. Such improvements result in both enhanced capability and significant reduction in silicon area.
On the VLSI design of a pipeline Reed-Solomon decoder using systolic arrays

NASA Technical Reports Server (NTRS)

Shao, Howard M.; Reed, Irving S.

1988-01-01

A new very large scale integration (VLSI) design of a pipeline Reed-Solomon decoder is presented. The transform decoding technique used in a previous article is replaced by a time domain algorithm through a detailed comparison of their VLSI implementations. A new architecture that implements the time domain algorithm permits efficient pipeline processing with reduced circuitry. Erasure correction capability is also incorporated with little additional complexity. By using multiplexing technique, a new implementation of Euclid's algorithm maintains the throughput rate with less circuitry. Such improvements result in both enhanced capability and significant reduction in silicon area.
The 1991 3rd NASA Symposium on VLSI Design

NASA Technical Reports Server (NTRS)

Maki, Gary K.

1991-01-01

Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2.
A single VLSI chip for computing syndromes in the (225, 223) Reed-Solomon decoder

NASA Technical Reports Server (NTRS)

Hsu, I. S.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.

1986-01-01

A description of a single VLSI chip for computing syndromes in the (255, 223) Reed-Solomon decoder is presented. The architecture that leads to this single VLSI chip design makes use of the dual basis multiplication algorithm. The same architecture can be applied to design VLSI chips to compute various kinds of number theoretic transforms.
An Efficient VLSI Architecture of the Enhanced Three Step Search Algorithm

NASA Astrophysics Data System (ADS)

Biswas, Baishik; Mukherjee, Rohan; Saha, Priyabrata; Chakrabarti, Indrajit

2016-09-01

The intense computational complexity of any video codec is largely due to the motion estimation unit. The Enhanced Three Step Search is a popular technique that can be adopted for fast motion estimation. This paper proposes a novel VLSI architecture for the implementation of the Enhanced Three Step Search Technique. A new addressing mechanism has been introduced which enhances the speed of operation and reduces the area requirements. The proposed architecture when implemented in Verilog HDL on Virtex-5 Technology and synthesized using Xilinx ISE Design Suite 14.1 achieves a critical path delay of 4.8 ns while the area comes out to be 2.9K gate equivalent. It can be incorporated in commercial devices like smart-phones, camcorders, video conferencing systems etc.
Application of a VLSI vector quantization processor to real-time speech coding

NASA Technical Reports Server (NTRS)

Davidson, G.; Gersho, A.

1986-01-01

Attention is given to a working vector quantization processor for speech coding that is based on a first-generation VLSI chip which efficiently performs the pattern-matching operation needed for the codebook search process (CPS). Using this chip, the CPS architecture has been successfully incorporated into a compact, single-board Vector PCM implementation operating at 7-18 kbits/sec. A real time Adaptive Vector Predictive Coder system using the CPS has also been implemented.
An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm

PubMed Central

Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En

2015-01-01

A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction. PMID:26287193
An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm.

PubMed

Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En

2015-08-13

A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction.
Efficient fuzzy C-means architecture for image segmentation.

PubMed

Li, Hui-Ya; Hwang, Wen-Jyi; Chang, Chia-Yen

2011-01-01

This paper presents a novel VLSI architecture for image segmentation. The architecture is based on the fuzzy c-means algorithm with spatial constraint for reducing the misclassification rate. In the architecture, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. In addition, an efficient pipelined circuit is used for the updating process for accelerating the computational speed. Experimental results show that the the proposed circuit is an effective alternative for real-time image segmentation with low area cost and low misclassification rate.
Recursive computer architecture for VLSI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Treleaven, P.C.; Hopkins, R.P.

1982-01-01

A general-purpose computer architecture based on the concept of recursion and suitable for VLSI computer systems built from replicated (lego-like) computing elements is presented. The recursive computer architecture is defined by presenting a program organisation, a machine organisation and an experimental machine implementation oriented to VLSI. The experimental implementation is being restricted to simple, identical microcomputers each containing a memory, a processor and a communications capability. This future generation of lego-like computer systems are termed fifth generation computers by the Japanese. 30 references.
Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
A VLSI implementation for synthetic aperture radar image processing

NASA Technical Reports Server (NTRS)

Premkumar, A.; Purviance, J.

1990-01-01

A simple physical model for the Synthetic Aperture Radar (SAR) is presented. This model explains the one dimensional and two dimensional nature of the received SAR signal in the range and azimuth directions. A time domain correlator, its algorithm, and features are explained. The correlator is ideally suited for VLSI implementation. A real time SAR architecture using these correlators is proposed. In the proposed architecture, the received SAR data is processed using one dimensional correlators for determining the range while two dimensional correlators are used to determine the azimuth of a target. The architecture uses only three different types of custom VLSI chips and a small amount of memory.
The Area-Time Complexity of Sorting.

DTIC Science & Technology

1984-12-01

suggests a classification of keys into short (k < logn), long (k > 2 logn), and of medium length. Optimal or near-optimal designs of VLSI sorters are...suggests a classification of keys into short (k 4 logn ), long (k > 21ogn ), and of medium length. Optimal or near-optimal designs of VLSI sorters are...ARCHITECTURES 79 5.1 Introduction 79 5.2 Parallel Algorithms for Sorting 80 . 5.3 Parallel Architectures 88 6 OPTIMAL VLSI SORTERS FOR KEYS OF LENGTH k - logn
vPELS: An E-Learning Social Environment for VLSI Design with Content Security Using DRM

ERIC Educational Resources Information Center

Dewan, Jahangir; Chowdhury, Morshed; Batten, Lynn

2014-01-01

This article provides a proposal for personal e-learning system (vPELS [where "v" stands for VLSI: very large scale integrated circuit])) architecture in the context of social network environment for VLSI Design. The main objective of vPELS is to develop individual skills on a specific subject--say, VLSI--and share resources with peers.…

Bioinspired architecture approach for a one-billion transistor smart CMOS camera chip

NASA Astrophysics Data System (ADS)

Fey, Dietmar; Komann, Marcus

2007-05-01

In the paper we present a massively parallel VLSI architecture for future smart CMOS camera chips with up to one billion transistors. To exploit efficiently the potential offered by future micro- or nanoelectronic devices traditional on central structures oriented parallel architectures based on MIMD or SIMD approaches will fail. They require too long and too many global interconnects for the distribution of code or the access to common memory. On the other hand nature developed self-organising and emergent principles to manage successfully complex structures based on lots of interacting simple elements. Therefore we developed a new as Marching Pixels denoted emergent computing paradigm based on a mixture of bio-inspired computing models like cellular automaton and artificial ants. In the paper we present different Marching Pixels algorithms and the corresponding VLSI array architecture. A detailed synthesis result for a 0.18 μm CMOS process shows that a 256×256 pixel image is processed in less than 10 ms assuming a moderate 100 MHz clock rate for the processor array. Future higher integration densities and a 3D chip stacking technology will allow the integration and processing of Mega pixels within the same time since our architecture is fully scalable.
The Fifth NASA Symposium on VLSI Design

NASA Technical Reports Server (NTRS)

1993-01-01

The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design.
Parallel optimization algorithms and their implementation in VLSI design

NASA Technical Reports Server (NTRS)

Lee, G.; Feeley, J. J.

1991-01-01

Two new parallel optimization algorithms based on the simplex method are described. They may be executed by a SIMD parallel processor architecture and be implemented in VLSI design. Several VLSI design implementations are introduced. An application example is reported to demonstrate that the algorithms are effective.
A parallel VLSI architecture for a digital filter using a number theoretic transform

NASA Technical Reports Server (NTRS)

Truong, T. K.; Reed, I. S.; Yeh, C. S.; Shao, H. M.

1983-01-01

The advantages of a very large scalee integration (VLSI) architecture for implementing a digital filter using fermat number transforms (FNT) are the following: It requires no multiplication. Only additions and bit rotations are needed. It alleviates the usual dynamic range limitation for long sequence FNT's. It utilizes the FNT and inverse FNT circuits 100% of the time. The lengths of the input data and filter sequences can be arbitraty and different. It is regular, simple, and expandable, and as a consequence suitable for VLSI implementation.
A new VLSI architecture for a single-chip-type Reed-Solomon decoder

NASA Technical Reports Server (NTRS)

Hsu, I. S.; Truong, T. K.

1989-01-01

A new very large scale integration (VLSI) architecture for implementing Reed-Solomon (RS) decoders that can correct both errors and erasures is described. This new architecture implements a Reed-Solomon decoder by using replication of a single VLSI chip. It is anticipated that this single chip type RS decoder approach will save substantial development and production costs. It is estimated that reduction in cost by a factor of four is possible with this new architecture. Furthermore, this Reed-Solomon decoder is programmable between 8 bit and 10 bit symbol sizes. Therefore, both an 8 bit Consultative Committee for Space Data Systems (CCSDS) RS decoder and a 10 bit decoder are obtained at the same time, and when concatenated with a (15,1/6) Viterbi decoder, provide an additional 2.1-dB coding gain.
Learning and optimization with cascaded VLSI neural network building-block chips

NASA Technical Reports Server (NTRS)

Duong, T.; Eberhardt, S. P.; Tran, M.; Daud, T.; Thakoor, A. P.

1992-01-01

To demonstrate the versatility of the building-block approach, two neural network applications were implemented on cascaded analog VLSI chips. Weights were implemented using 7-b multiplying digital-to-analog converter (MDAC) synapse circuits, with 31 x 32 and 32 x 32 synapses per chip. A novel learning algorithm compatible with analog VLSI was applied to the two-input parity problem. The algorithm combines dynamically evolving architecture with limited gradient-descent backpropagation for efficient and versatile supervised learning. To implement the learning algorithm in hardware, synapse circuits were paralleled for additional quantization levels. The hardware-in-the-loop learning system allocated 2-5 hidden neurons for parity problems. Also, a 7 x 7 assignment problem was mapped onto a cascaded 64-neuron fully connected feedback network. In 100 randomly selected problems, the network found optimal or good solutions in most cases, with settling times in the range of 7-100 microseconds.
Real-time FPGA architectures for computer vision

NASA Astrophysics Data System (ADS)

Arias-Estrada, Miguel; Torres-Huitzil, Cesar

2000-03-01

This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low level image processing. The FPGA-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on a dedicated VLSI to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real time performance are discussed. Some results are presented and discussed.
A novel configurable VLSI architecture design of window-based image processing method

NASA Astrophysics Data System (ADS)

Zhao, Hui; Sang, Hongshi; Shen, Xubang

2018-03-01

Most window-based image processing architecture can only achieve a certain kind of specific algorithms, such as 2D convolution, and therefore lack the flexibility and breadth of application. In addition, improper handling of the image boundary can cause loss of accuracy, or consume more logic resources. For the above problems, this paper proposes a new VLSI architecture of window-based image processing operations, which is configurable and based on consideration of the image boundary. An efficient technique is explored to manage the image borders by overlapping and flushing phases at the end of row and the end of frame, which does not produce new delay and reduce the overhead in real-time applications. Maximize the reuse of the on-chip memory data, in order to reduce the hardware complexity and external bandwidth requirements. To perform different scalar function and reduction function operations in pipeline, this can support a variety of applications of window-based image processing. Compared with the performance of other reported structures, the performance of the new structure has some similarities to some of the structures, but also superior to some other structures. Especially when compared with a systolic array processor CWP, this structure at the same frequency of approximately 12.9% of the speed increases. The proposed parallel VLSI architecture was implemented with SIMC 0.18-μm CMOS technology, and the maximum clock frequency, power consumption, and area are 125Mhz, 57mW, 104.8K Gates, respectively, furthermore the processing time is independent of the different window-based algorithms mapped to the structure
Architecture for VLSI design of Reed-Solomon encoders

NASA Technical Reports Server (NTRS)

Liu, K. Y.

1981-01-01

The logic structure of a universal VLSI chip called the symbol-slice Reed-Solomon (RS) encoder chip is discussed. An RS encoder can be constructed by cascading and properly interconnecting a group of such VLSI chips. As a design example, it is shown that a (255,223) RD encoder requiring around 40 discrete CMOS ICs may be replaced by an RS encoder consisting of four identical interconnected VLSI RS encoder chips. Besides the size advantage, the VLSI RS encoder also has the potential advantages of requiring less power and having a higher reliability.
VLSI implementation of RSA encryption system using ancient Indian Vedic mathematics

NASA Astrophysics Data System (ADS)

Thapliyal, Himanshu; Srinivas, M. B.

2005-06-01

This paper proposes the hardware implementation of RSA encryption/decryption algorithm using the algorithms of Ancient Indian Vedic Mathematics that have been modified to improve performance. The recently proposed hierarchical overlay multiplier architecture is used in the RSA circuitry for multiplication operation. The most significant aspect of the paper is the development of a division architecture based on Straight Division algorithm of Ancient Indian Vedic Mathematics and embedding it in RSA encryption/decryption circuitry for improved efficiency. The coding is done in Verilog HDL and the FPGA synthesis is done using Xilinx Spartan library. The results show that RSA circuitry implemented using Vedic division and multiplication is efficient in terms of area/speed compared to its implementation using conventional multiplication and division architectures.
VLSI design of a single chip reed-solomon encoder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Truong, T.K.; Deutsch, L.J.; Reed, I.S.

A design for a single chip implementation of a Reed-Solomon encoder is presented. The architecture that leads to this single VLSI chip design makes use of a bit serial finite field multiplication algorithm.
Design and Fabrication of High-Efficiency CMOS/CCD Imagers

NASA Technical Reports Server (NTRS)

Pain, Bedabrata

2007-01-01

An architecture for back-illuminated complementary metal oxide/semiconductor (CMOS) and charge-coupled-device (CCD) ultraviolet/visible/near infrared- light image sensors, and a method of fabrication to implement the architecture, are undergoing development. The architecture and method are expected to enable realization of the full potential of back-illuminated CMOS/CCD imagers to perform with high efficiency, high sensitivity, excellent angular response, and in-pixel signal processing. The architecture and method are compatible with next-generation CMOS dielectric-forming and metallization techniques, and the process flow of the method is compatible with process flows typical of the manufacture of very-large-scale integrated (VLSI) circuits. The architecture and method overcome all obstacles that have hitherto prevented high-yield, low-cost fabrication of back-illuminated CMOS/CCD imagers by use of standard VLSI fabrication tools and techniques. It is not possible to discuss the obstacles in detail within the space available for this article. Briefly, the obstacles are posed by the problems of generating light-absorbing layers having desired uniform and accurate thicknesses, passivation of surfaces, forming structures for efficient collection of charge carriers, and wafer-scale thinning (in contradistinction to diescale thinning). A basic element of the present architecture and method - the element that, more than any other, makes it possible to overcome the obstacles - is the use of an alternative starting material: Instead of starting with a conventional bulk-CMOS wafer that consists of a p-doped epitaxial silicon layer grown on a heavily-p-doped silicon substrate, one starts with a special silicon-on-insulator (SOI) wafer that consists of a thermal oxide buried between a lightly p- or n-doped, thick silicon layer and a device silicon layer of appropriate thickness and doping. The thick silicon layer is used as a handle: that is, as a mechanical support for the device silicon layer during micro-fabrication.
The VLSI design of a single chip Reed-Solomon encoder

NASA Technical Reports Server (NTRS)

Truong, T. K.; Deutsch, L. J.; Reed, I. S.

1982-01-01

A design for a single chip implementation of a Reed-Solomon encoder is presented. The architecture that leads to this single VLSI chip design makes use of a bit serial finite field multiplication algorithm.
Periodic binary sequence generators: VLSI circuits considerations

NASA Technical Reports Server (NTRS)

Perlman, M.

1984-01-01

Feedback shift registers are efficient periodic binary sequence generators. Polynomials of degree r over a Galois field characteristic 2(GF(2)) characterize the behavior of shift registers with linear logic feedback. The algorithmic determination of the trinomial of lowest degree, when it exists, that contains a given irreducible polynomial over GF(2) as a factor is presented. This corresponds to embedding the behavior of an r-stage shift register with linear logic feedback into that of an n-stage shift register with a single two-input modulo 2 summer (i.e., Exclusive-OR gate) in its feedback. This leads to Very Large Scale Integrated (VLSI) circuit architecture of maximal regularity (i.e., identical cells) with intercell communications serialized to a maximal degree.
Compact Interconnection Networks Based on Quantum Dots

NASA Technical Reports Server (NTRS)

Fijany, Amir; Toomarian, Nikzad; Modarress, Katayoon; Spotnitz, Matthew

2003-01-01

Architectures that would exploit the distinct characteristics of quantum-dot cellular automata (QCA) have been proposed for digital communication networks that connect advanced digital computing circuits. In comparison with networks of wires in conventional very-large-scale integrated (VLSI) circuitry, the networks according to the proposed architectures would be more compact. The proposed architectures would make it possible to implement complex interconnection schemes that are required for some advanced parallel-computing algorithms and that are difficult (and in many cases impractical) to implement in VLSI circuitry. The difficulty of implementation in VLSI and the major potential advantage afforded by QCA were described previously in Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), NASA Tech Briefs, Vol. 25, No. 10 (October 2001), page 42. To recapitulate: Wherever two wires in a conventional VLSI circuit cross each other and are required not to be in electrical contact with each other, there must be a layer of electrical insulation between them. This, in turn, makes it necessary to resort to a noncoplanar and possibly a multilayer design, which can be complex, expensive, and even impractical. As a result, much of the cost of designing VLSI circuits is associated with minimization of data routing and assignment of layers to minimize crossing of wires. Heretofore, these considerations have impeded the development of VLSI circuitry to implement complex, advanced interconnection schemes. On the other hand, with suitable design and under suitable operating conditions, QCA-based signal paths can be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes. The proposed architectures require two advances in QCA-based circuitry beyond basic QCA-based binary-signal wires described in the cited prior article. One of these advances would be the development of QCA-based wires capable of bidirectional transmission of signals. The other advance would be the development of QCA circuits capable of high-impedance state outputs. The high-impedance states would be utilized along with the 0- and 1-state outputs of QCA.
Novel Highly Parallel and Systolic Architectures Using Quantum Dot-Based Hardware

NASA Technical Reports Server (NTRS)

Fijany, Amir; Toomarian, Benny N.; Spotnitz, Matthew

1997-01-01

VLSI technology has made possible the integration of massive number of components (processors, memory, etc.) into a single chip. In VLSI design, memory and processing power are relatively cheap and the main emphasis of the design is on reducing the overall interconnection complexity since data routing costs dominate the power, time, and area required to implement a computation. Communication is costly because wires occupy the most space on a circuit and it can also degrade clock time. In fact, much of the complexity (and hence the cost) of VLSI design results from minimization of data routing. The main difficulty in VLSI routing is due to the fact that crossing of the lines carrying data, instruction, control, etc. is not possible in a plane. Thus, in order to meet this constraint, the VLSI design aims at keeping the architecture highly regular with local and short interconnection. As a result, while the high level of integration has opened the way for massively parallel computation, practical and full exploitation of such a capability in many applications of interest has been hindered by the constraints on interconnection pattern. More precisely. the use of only localized communication significantly simplifies the design of interconnection architecture but at the expense of somewhat restricted class of applications. For example, there are currently commercially available products integrating; hundreds of simple processor elements within a single chip. However, the lack of adequate interconnection pattern among these processing elements make them inefficient for exploiting a large degree of parallelism in many applications.
A Coherent VLSI Design Environment

DTIC Science & Technology

1987-03-31

experimentally on realistic problems. U In the area of parallel algorithms and architectures, Prof. Leighton and Briic= Maggs are developing efficient...performance penalty. The flexibility is particularly important in an experimental machine. For example, we can redefine system messages such as ’SEND’ or...Theorem -- What It Says, Why It’s True , and Some of the Things It Predicts," Department of Computer Science, California Insti- tute of Technology
A VLSI architecture for simplified arithmetic Fourier transform algorithm

NASA Technical Reports Server (NTRS)

Reed, Irving S.; Shih, Ming-Tang; Truong, T. K.; Hendon, E.; Tufts, D. W.

1992-01-01

The arithmetic Fourier transform (AFT) is a number-theoretic approach to Fourier analysis which has been shown to perform competitively with the classical FFT in terms of accuracy, complexity, and speed. Theorems developed in a previous paper for the AFT algorithm are used here to derive the original AFT algorithm which Bruns found in 1903. This is shown to yield an algorithm of less complexity and of improved performance over certain recent AFT algorithms. A VLSI architecture is suggested for this simplified AFT algorithm. This architecture uses a butterfly structure which reduces the number of additions by 25 percent of that used in the direct method.
VLSI architectures for computing multiplications and inverses in GF(2m)

NASA Technical Reports Server (NTRS)

Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.

1985-01-01

Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.
VLSI architectures for computing multiplications and inverses in GF(2-m)

NASA Technical Reports Server (NTRS)

Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.; Reed, I. S.

1983-01-01

Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.

VLSI architectures for computing multiplications and inverses in GF(2m).

PubMed

Wang, C C; Truong, T K; Shao, H M; Deutsch, L J; Omura, J K; Reed, I S

1985-08-01

Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that can be easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. In this paper, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal basis representation used together with this multiplier, a pipeline architecture is developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable, and therefore, naturally suitable for VLSI implementation.
A fast new algorithm for a robot neurocontroller using inverse QR decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morris, A.S.; Khemaissia, S.

2000-01-01

A new adaptive neural network controller for robots is presented. The controller is based on direct adaptive techniques. Unlike many neural network controllers in the literature, inverse dynamical model evaluation is not required. A numerically robust, computationally efficient processing scheme for neutral network weight estimation is described, namely, the inverse QR decomposition (INVQR). The inverse QR decomposition and a weighted recursive least-squares (WRLS) method for neural network weight estimation is derived using Cholesky factorization of the data matrix. The algorithm that performs the efficient INVQR of the underlying space-time data matrix may be implemented in parallel on a triangular array.more » Furthermore, its systolic architecture is well suited for VLSI implementation. Another important benefit is well suited for VLSI implementation. Another important benefit of the INVQR decomposition is that it solves directly for the time-recursive least-squares filter vector, while avoiding the sequential back-substitution step required by the QR decomposition approaches.« less
Architecture for VLSI design of Reed-Solomon encoders

NASA Technical Reports Server (NTRS)

Liu, K. Y.

1982-01-01

A description is given of the logic structure of the universal VLSI symbol-slice Reed-Solomon (RS) encoder chip, from a group of which an RS encoder may be constructed through cascading and proper interconnection. As a design example, it is shown that an RS encoder presently requiring approximately 40 discrete CMOS ICs may be replaced by an RS encoder consisting of four identical, interconnected VLSI RS encoder chips, offering in addition to greater compactness both a lower power requirement and greater reliability.
Architecture for VLSI design of Reed-Solomon encoders

NASA Astrophysics Data System (ADS)

Liu, K. Y.

1982-02-01

A description is given of the logic structure of the universal VLSI symbol-slice Reed-Solomon (RS) encoder chip, from a group of which an RS encoder may be constructed through cascading and proper interconnection. As a design example, it is shown that an RS encoder presently requiring approximately 40 discrete CMOS ICs may be replaced by an RS encoder consisting of four identical, interconnected VLSI RS encoder chips, offering in addition to greater compactness both a lower power requirement and greater reliability.
ProperCAD: A portable object-oriented parallel environment for VLSI CAD

NASA Technical Reports Server (NTRS)

Ramkumar, Balkrishna; Banerjee, Prithviraj

1993-01-01

Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.
Content-addressable read/write memories for image analysis

NASA Technical Reports Server (NTRS)

Snyder, W. E.; Savage, C. D.

1982-01-01

The commonly encountered image analysis problems of region labeling and clustering are found to be cases of search-and-rename problem which can be solved in parallel by a system architecture that is inherently suitable for VLSI implementation. This architecture is a novel form of content-addressable memory (CAM) which provides parallel search and update functions, allowing speed reductions down to constant time per operation. It has been proposed in related investigations by Hall (1981) that, with VLSI, CAM-based structures with enhanced instruction sets for general purpose processing will be feasible.
Real time SAR processing

NASA Technical Reports Server (NTRS)

Premkumar, A. B.; Purviance, J. E.

1990-01-01

A simplified model for the SAR imaging problem is presented. The model is based on the geometry of the SAR system. Using this model an expression for the entire phase history of the received SAR signal is formulated. From the phase history, it is shown that the range and the azimuth coordinates for a point target image can be obtained by processing the phase information during the intrapulse and interpulse periods respectively. An architecture for a VLSI implementation for the SAR signal processor is presented which generates images in real time. The architecture uses a small number of chips, a new correlation processor, and an efficient azimuth correlation process.
Summary of workshop on the application of VLSI for robotic sensing

NASA Technical Reports Server (NTRS)

Brooks, T.; Wilcox, B.

1984-01-01

It was one of the objectives of the considered workshop to identify near, mid, and far-term applications of VLSI for robotic sensing and sensor data preprocessing. The workshop was also to indicate areas in which VLSI technology can provide immediate and future payoffs. A third objective is related to the promotion of dialog and collaborative efforts between research communities, industry, and government. The workshop was held on March 24-25, 1983. Conclusions and recommendations are discussed. Attention is given to the need for a pixel correction chip, an image sensor with 10,000 dynamic range, VLSI enhanced architectures, the need for a high-density serpentine memory, an LSI-tactile sensing program, an analog-signal preprocessor chip, a smart strain gage, a protective proximity envelope, a VLSI-proximity sensor program, a robot-net chip, and aspects of silicon micromechanics.
Vlsi implementation of flexible architecture for decision tree classification in data mining

NASA Astrophysics Data System (ADS)

Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

2017-07-01

The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Hybrid VLSI/QCA Architecture for Computing FFTs

NASA Technical Reports Server (NTRS)

Fijany, Amir; Toomarian, Nikzad; Modarres, Katayoon; Spotnitz, Matthew

2003-01-01

A data-processor architecture that would incorporate elements of both conventional very-large-scale integrated (VLSI) circuitry and quantum-dot cellular automata (QCA) has been proposed to enable the highly parallel and systolic computation of fast Fourier transforms (FFTs). The proposed circuit would complement the QCA-based circuits described in several prior NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; and Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35. The cited prior articles described the limitations of very-large-scale integrated (VLSI) circuitry and the major potential advantage afforded by QCA. To recapitulate: In a VLSI circuit, signal paths that are required not to interact with each other must not cross in the same plane. In contrast, for reasons too complex to describe in the limited space available for this article, suitably designed and operated QCAbased signal paths that are required not to interact with each other can nevertheless be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes.
Highly efficient simulation environment for HDTV video decoder in VLSI design

NASA Astrophysics Data System (ADS)

Mao, Xun; Wang, Wei; Gong, Huimin; He, Yan L.; Lou, Jian; Yu, Lu; Yao, Qingdong; Pirsch, Peter

2002-01-01

With the increase of the complex of VLSI such as the SoC (System on Chip) of MPEG-2 Video decoder with HDTV scalability especially, simulation and verification of the full design, even as high as the behavior level in HDL, often proves to be very slow, costly and it is difficult to perform full verification until late in the design process. Therefore, they become bottleneck of the procedure of HDTV video decoder design, and influence it's time-to-market mostly. In this paper, the architecture of Hardware/Software Interface of HDTV video decoder is studied, and a Hardware-Software Mixed Simulation (HSMS) platform is proposed to check and correct error in the early design stage, based on the algorithm of MPEG-2 video decoding. The application of HSMS to target system could be achieved by employing several introduced approaches. Those approaches speed up the simulation and verification task without decreasing performance.
(abstract) A High Throughput 3-D Inner Product Processor

NASA Technical Reports Server (NTRS)

Daud, Tuan

1996-01-01

A particularily challenging image processing application is the real time scene acquisition and object discrimination. It requires spatio-temporal recognition of point and resolved objects at high speeds with parallel processing algorithms. Neural network paradigms provide fine grain parallism and, when implemented in hardware, offer orders of magnitude speed up. However, neural networks implemented on a VLSI chip are planer architectures capable of efficient processing of linear vector signals rather than 2-D images. Therefore, for processing of images, a 3-D stack of neural-net ICs receiving planar inputs and consuming minimal power are required. Details of the circuits with chip architectures will be described with need to develop ultralow-power electronics. Further, use of the architecture in a system for high-speed processing will be illustrated.
The VLSI design of an error-trellis syndrome decoder for certain convolutional codes

NASA Technical Reports Server (NTRS)

Reed, I. S.; Jensen, J. M.; Hsu, I.-S.; Truong, T. K.

1986-01-01

A recursive algorithm using the error-trellis decoding technique is developed to decode convolutional codes (CCs). An example, illustrating the very large scale integration (VLSI) architecture of such a decode, is given for a dual-K CC. It is demonstrated that such a decoder can be realized readily on a single chip with metal-nitride-oxide-semiconductor technology.
Systolic VLSI Reed-Solomon Decoder

NASA Technical Reports Server (NTRS)

Shao, H. M.; Truong, T. K.; Deutsch, L. J.; Yuen, J. H.

1986-01-01

Decoder for digital communications provides high-speed, pipelined ReedSolomon (RS) error-correction decoding of data streams. Principal new feature of proposed decoder is modification of Euclid greatest-common-divisor algorithm to avoid need for time-consuming computations of inverse of certain Galois-field quantities. Decoder architecture suitable for implementation on very-large-scale integrated (VLSI) chips with negative-channel metaloxide/silicon circuitry.
The VLSI design of error-trellis syndrome decoding for convolutional codes

NASA Technical Reports Server (NTRS)

Reed, I. S.; Jensen, J. M.; Truong, T. K.; Hsu, I. S.

1985-01-01

A recursive algorithm using the error-trellis decoding technique is developed to decode convolutional codes (CCs). An example, illustrating the very large scale integration (VLSI) architecture of such a decode, is given for a dual-K CC. It is demonstrated that such a decoder can be realized readily on a single chip with metal-nitride-oxide-semiconductor technology.
A VLSI architecture for performing finite field arithmetic with reduced table look-up

NASA Technical Reports Server (NTRS)

Hsu, I. S.; Truong, T. K.; Reed, I. S.

1986-01-01

A new table look-up method for finding the log and antilog of finite field elements has been developed by N. Glover. In his method, the log and antilog of a field element is found by the use of several smaller tables. The method is based on a use of the Chinese Remainder Theorem. The technique often results in a significant reduction in the memory requirements of the problem. A VLSI architecture is developed for a special case of this new algorithm to perform finite field arithmetic including multiplication, division, and the finding of an inverse element in the finite field.
A Compact VLSI System for Bio-Inspired Visual Motion Estimation.

PubMed

Shi, Cong; Luo, Gang

2018-04-01

This paper proposes a bio-inspired visual motion estimation algorithm based on motion energy, along with its compact very-large-scale integration (VLSI) architecture using low-cost embedded systems. The algorithm mimics motion perception functions of retina, V1, and MT neurons in a primate visual system. It involves operations of ternary edge extraction, spatiotemporal filtering, motion energy extraction, and velocity integration. Moreover, we propose the concept of confidence map to indicate the reliability of estimation results on each probing location. Our algorithm involves only additions and multiplications during runtime, which is suitable for low-cost hardware implementation. The proposed VLSI architecture employs multiple (frame, pixel, and operation) levels of pipeline and massively parallel processing arrays to boost the system performance. The array unit circuits are optimized to minimize hardware resource consumption. We have prototyped the proposed architecture on a low-cost field-programmable gate array platform (Zynq 7020) running at 53-MHz clock frequency. It achieved 30-frame/s real-time performance for velocity estimation on 160 × 120 probing locations. A comprehensive evaluation experiment showed that the estimated velocity by our prototype has relatively small errors (average endpoint error < 0.5 pixel and angular error < 10°) for most motion cases.
NASA Space Engineering Research Center for VLSI systems design

NASA Technical Reports Server (NTRS)

1991-01-01

This annual review reports the center's activities and findings on very large scale integration (VLSI) systems design for 1990, including project status, financial support, publications, the NASA Space Engineering Research Center (SERC) Symposium on VLSI Design, research results, and outreach programs. Processor chips completed or under development are listed. Research results summarized include a design technique to harden complementary metal oxide semiconductors (CMOS) memory circuits against single event upset (SEU); improved circuit design procedures; and advances in computer aided design (CAD), communications, computer architectures, and reliability design. Also described is a high school teacher program that exposes teachers to the fundamentals of digital logic design.
Opto-VLSI-based photonic true-time delay architecture for broadband adaptive nulling in phased array antennas.

PubMed

Juswardy, Budi; Xiao, Feng; Alameh, Kamal

2009-03-16

This paper proposes a novel Opto-VLSI-based tunable true-time delay generation unit for adaptively steering the nulls of microwave phased array antennas. Arbitrary single or multiple true-time delays can simultaneously be synthesized for each antenna element by slicing an RF-modulated broadband optical source and routing specific sliced wavebands through an Opto-VLSI processor to a high-dispersion fiber. Experimental results are presented, which demonstrate the principle of the true-time delay unit through the generation of 5 arbitrary true-time delays of up to 2.5 ns each. (c) 2009 Optical Society of America
Associative Pattern Recognition In Analog VLSI Circuits

NASA Technical Reports Server (NTRS)

Tawel, Raoul

1995-01-01

Winner-take-all circuit selects best-match stored pattern. Prototype cascadable very-large-scale integrated (VLSI) circuit chips built and tested to demonstrate concept of electronic associative pattern recognition. Based on low-power, sub-threshold analog complementary oxide/semiconductor (CMOS) VLSI circuitry, each chip can store 128 sets (vectors) of 16 analog values (vector components), vectors representing known patterns as diverse as spectra, histograms, graphs, or brightnesses of pixels in images. Chips exploit parallel nature of vector quantization architecture to implement highly parallel processing in relatively simple computational cells. Through collective action, cells classify input pattern in fraction of microsecond while consuming power of few microwatts.

Discrete-Time Demodulator Architectures for Free-Space Broadband Optical Pulse-Position Modulation

NASA Technical Reports Server (NTRS)

Gray, A. A.; Lee, C.

2004-01-01

The objective of this work is to develop discrete-time demodulator architectures for broadband optical pulse-position modulation (PPM) that are capable of processing Nyquist or near-Nyquist data rates. These architectures are motivated by the numerous advantages of realizing communications demodulators in digital very large scale integrated (VLSI) circuits. The architectures are developed within a framework that encompasses a large body of work in optical communications, synchronization, and multirate discrete-time signal processing and are constrained by the limitations of the state of the art in digital hardware. This work attempts to create a bridge between theoretical communication algorithms and analysis for deep-space optical PPM and modern digital VLSI. The primary focus of this work is on the synthesis of discrete-time processing architectures for accomplishing the most fundamental functions required in PPM demodulators, post-detection filtering, synchronization, and decision processing. The architectures derived are capable of closely approximating the theoretical performance of the continuous-time algorithms from which they are derived. The work concludes with an outline of the development path that leads to hardware.
Vertically integrated photonic multichip module architecture for vision applications

NASA Astrophysics Data System (ADS)

Tanguay, Armand R., Jr.; Jenkins, B. Keith; von der Malsburg, Christoph; Mel, Bartlett; Holt, Gary; O'Brien, John D.; Biederman, Irving; Madhukar, Anupam; Nasiatka, Patrick; Huang, Yunsong

2000-05-01

The development of a truly smart camera, with inherent capability for low latency semi-autonomous object recognition, tracking, and optimal image capture, has remained an elusive goal notwithstanding tremendous advances in the processing power afforded by VLSI technologies. These features are essential for a number of emerging multimedia- based applications, including enhanced augmented reality systems. Recent advances in understanding of the mechanisms of biological vision systems, together with similar advances in hybrid electronic/photonic packaging technology, offer the possibility of artificial biologically-inspired vision systems with significantly different, yet complementary, strengths and weaknesses. We describe herein several system implementation architectures based on spatial and temporal integration techniques within a multilayered structure, as well as the corresponding hardware implementation of these architectures based on the hybrid vertical integration of multiple silicon VLSI vision chips by means of dense 3D photonic interconnections.
Motion-sensor fusion-based gesture recognition and its VLSI architecture design for mobile devices

NASA Astrophysics Data System (ADS)

Zhu, Wenping; Liu, Leibo; Yin, Shouyi; Hu, Siqi; Tang, Eugene Y.; Wei, Shaojun

2014-05-01

With the rapid proliferation of smartphones and tablets, various embedded sensors are incorporated into these platforms to enable multimodal human-computer interfaces. Gesture recognition, as an intuitive interaction approach, has been extensively explored in the mobile computing community. However, most gesture recognition implementations by now are all user-dependent and only rely on accelerometer. In order to achieve competitive accuracy, users are required to hold the devices in predefined manner during the operation. In this paper, a high-accuracy human gesture recognition system is proposed based on multiple motion sensor fusion. Furthermore, to reduce the energy overhead resulted from frequent sensor sampling and data processing, a high energy-efficient VLSI architecture implemented on a Xilinx Virtex-5 FPGA board is also proposed. Compared with the pure software implementation, approximately 45 times speed-up is achieved while operating at 20 MHz. The experiments show that the average accuracy for 10 gestures achieves 93.98% for user-independent case and 96.14% for user-dependent case when subjects hold the device randomly during completing the specified gestures. Although a few percent lower than the conventional best result, it still provides competitive accuracy acceptable for practical usage. Most importantly, the proposed system allows users to hold the device randomly during operating the predefined gestures, which substantially enhances the user experience.
High-performance multiprocessor architecture for a 3-D lattice gas model

NASA Technical Reports Server (NTRS)

Lee, F.; Flynn, M.; Morf, M.

1991-01-01

The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
Recovery Act - CAREER: Sustainable Silicon -- Energy-Efficient VLSI Interconnect for Extreme-Scale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chiang, Patrick

2014-01-31

The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
The design plan of a VLSI single chip (255, 223) Reed-Solomon decoder

NASA Technical Reports Server (NTRS)

Hsu, I. S.; Shao, H. M.; Deutsch, L. J.

1987-01-01

The very large-scale integration (VLSI) architecture of a single chip (255, 223) Reed-Solomon decoder for decoding both errors and erasures is described. A decoding failure detection capability is also included in this system so that the decoder will recognize a failure to decode instead of introducing additional errors. This could happen whenever the received word contains too many errors and erasures for the code to correct. The number of transistors needed to implement this decoder is estimated at about 75,000 if the delay for received message is not included. This is in contrast to the older transform decoding algorithm which needs about 100,000 transistors. However, the transform decoder is simpler in architecture than the time decoder. It is therefore possible to implement a single chip (255, 223) Reed-Solomon decoder with today's VLSI technology. An implementation strategy for the decoder system is presented. This represents the first step in a plan to take advantage of advanced coding techniques to realize a 2.0 dB coding gain for future space missions.
On VLSI Design of Rank-Order Filtering using DCRAM Architecture

PubMed Central

Lin, Meng-Chun; Dung, Lan-Rong

2009-01-01

This paper addresses on VLSI design of rank-order filtering (ROF) with a maskable memory for real-time speech and image processing applications. Based on a generic bit-sliced ROF algorithm, the proposed design uses a special-defined memory, called the dual-cell random-access memory (DCRAM), to realize major operations of ROF: threshold decomposition and polarization. Using the memory-oriented architecture, the proposed ROF processor can benefit from high flexibility, low cost and high speed. The DCRAM can perform the bit-sliced read, partial write, and pipelined processing. The bit-sliced read and partial write are driven by maskable registers. With recursive execution of the bit-slicing read and partial write, the DCRAM can effectively realize ROF in terms of cost and speed. The proposed design has been implemented using TSMC 0.18 μm 1P6M technology. As shown in the result of physical implementation, the core size is 356.1 × 427.7μm2 and the VLSI implementation of ROF can operate at 256 MHz for 1.8V supply. PMID:19865599
High-speed architecture for the decoding of trellis-coded modulation

NASA Technical Reports Server (NTRS)

Osborne, William P.

1992-01-01

Since 1971, when the Viterbi Algorithm was introduced as the optimal method of decoding convolutional codes, improvements in circuit technology, especially VLSI, have steadily increased its speed and practicality. Trellis-Coded Modulation (TCM) combines convolutional coding with higher level modulation (non-binary source alphabet) to provide forward error correction and spectral efficiency. For binary codes, the current stare-of-the-art is a 64-state Viterbi decoder on a single CMOS chip, operating at a data rate of 25 Mbps. Recently, there has been an interest in increasing the speed of the Viterbi Algorithm by improving the decoder architecture, or by reducing the algorithm itself. Designs employing new architectural techniques are now in existence, however these techniques are currently applied to simpler binary codes, not to TCM. The purpose of this report is to discuss TCM architectural considerations in general, and to present the design, at the logic gate level, or a specific TCM decoder which applies these considerations to achieve high-speed decoding.
A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search.

PubMed

Chang, Yuan-Jyun; Hwang, Wen-Jyi; Chen, Chih-Chang

2016-12-07

The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO). The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.
A comparison of VLSI architectures for time and transform domain decoding of Reed-Solomon codes

NASA Technical Reports Server (NTRS)

Hsu, I. S.; Truong, T. K.; Deutsch, L. J.; Satorius, E. H.; Reed, I. S.

1988-01-01

It is well known that the Euclidean algorithm or its equivalent, continued fractions, can be used to find the error locator polynomial needed to decode a Reed-Solomon (RS) code. It is shown that this algorithm can be used for both time and transform domain decoding by replacing its initial conditions with the Forney syndromes and the erasure locator polynomial. By this means both the errata locator polynomial and the errate evaluator polynomial can be obtained with the Euclidean algorithm. With these ideas, both time and transform domain Reed-Solomon decoders for correcting errors and erasures are simplified and compared. As a consequence, the architectures of Reed-Solomon decoders for correcting both errors and erasures can be made more modular, regular, simple, and naturally suitable for VLSI implementation.
Architectures for single-chip image computing

NASA Astrophysics Data System (ADS)

Gove, Robert J.

1992-04-01

This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
Multiplier Architecture for Coding Circuits

NASA Technical Reports Server (NTRS)

Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.

1986-01-01

Multipliers based on new algorithm for Galois-field (GF) arithmetic regular and expandable. Pipeline structures used for computing both multiplications and inverses. Designs suitable for implementation in very-large-scale integrated (VLSI) circuits. This general type of inverter and multiplier architecture especially useful in performing finite-field arithmetic of Reed-Solomon error-correcting codes and of some cryptographic algorithms.
Submicron Systems Architecture Project

DTIC Science & Technology

1981-11-01

This project is concerned with the architecture , design , and testing of VLSI Systems. The principal activities in this report period include: The Tree Machine; COPE, The Homogeneous Machine; Computational Arrays; Switch-Level Model for MOS Logic Design; Testing; Local Network and Designer Workstations; Self-timed Systems; Characterization of Deadlock Free Resource Contention; Concurrency Algebra; Language Design and Logic for Program Verification.
Design and implementation of interface units for high speed fiber optics local area networks and broadband integrated services digital networks

NASA Technical Reports Server (NTRS)

Tobagi, Fouad A.; Dalgic, Ismail; Pang, Joseph

1990-01-01

The design and implementation of interface units for high speed Fiber Optic Local Area Networks and Broadband Integrated Services Digital Networks are discussed. During the last years, a number of network adapters that are designed to support high speed communications have emerged. This approach to the design of a high speed network interface unit was to implement package processing functions in hardware, using VLSI technology. The VLSI hardware implementation of a buffer management unit, which is required in such architectures, is described.
VLSI neuroprocessors

NASA Technical Reports Server (NTRS)

Kemeny, Sabrina E.

1994-01-01

Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
An Efficient Hardware Circuit for Spike Sorting Based on Competitive Learning Networks.

PubMed

Chen, Huan-Yuan; Chen, Chih-Chang; Hwang, Wen-Jyi

2017-09-28

This study aims to present an effective VLSI circuit for multi-channel spike sorting. The circuit supports the spike detection, feature extraction and classification operations. The detection circuit is implemented in accordance with the nonlinear energy operator algorithm. Both the peak detection and area computation operations are adopted for the realization of the hardware architecture for feature extraction. The resulting feature vectors are classified by a circuit for competitive learning (CL) neural networks. The CL circuit supports both online training and classification. In the proposed architecture, all the channels share the same detection, feature extraction, learning and classification circuits for a low area cost hardware implementation. The clock-gating technique is also employed for reducing the power dissipation. To evaluate the performance of the architecture, an application-specific integrated circuit (ASIC) implementation is presented. Experimental results demonstrate that the proposed circuit exhibits the advantages of a low chip area, a low power dissipation and a high classification success rate for spike sorting.
An Efficient Hardware Circuit for Spike Sorting Based on Competitive Learning Networks

PubMed Central

Chen, Huan-Yuan; Chen, Chih-Chang

2017-01-01

This study aims to present an effective VLSI circuit for multi-channel spike sorting. The circuit supports the spike detection, feature extraction and classification operations. The detection circuit is implemented in accordance with the nonlinear energy operator algorithm. Both the peak detection and area computation operations are adopted for the realization of the hardware architecture for feature extraction. The resulting feature vectors are classified by a circuit for competitive learning (CL) neural networks. The CL circuit supports both online training and classification. In the proposed architecture, all the channels share the same detection, feature extraction, learning and classification circuits for a low area cost hardware implementation. The clock-gating technique is also employed for reducing the power dissipation. To evaluate the performance of the architecture, an application-specific integrated circuit (ASIC) implementation is presented. Experimental results demonstrate that the proposed circuit exhibits the advantages of a low chip area, a low power dissipation and a high classification success rate for spike sorting. PMID:28956859
A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

NASA Astrophysics Data System (ADS)

Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

2010-07-01

This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
VLSI Architectures and CAD

DTIC Science & Technology

1989-04-01

existing types of data compression methods amenable to our needs: Huffman, Arithmetic, BSTW, and Lempel - Ziv . The two algorithms with the most modest...APEX architecture. Recently we bega-, investigating various data compression algorithms with character- istics amenable to hardware implementation...This work has so far yielded a variant of the Lempel - Ziv algorithm that adapts continuously to its input and is appropriate to a hardware implementation
VLSI chips for vision-based vehicle guidance

NASA Astrophysics Data System (ADS)

Masaki, Ichiro

1994-02-01

Sensor-based vehicle guidance systems are gathering rapidly increasing interest because of their potential for increasing safety, convenience, environmental friendliness, and traffic efficiency. Examples of applications include intelligent cruise control, lane following, collision warning, and collision avoidance. This paper reviews the research trends in vision-based vehicle guidance with an emphasis on VLSI chip implementations of the vision systems. As an example of VLSI chips for vision-based vehicle guidance, a stereo vision system is described in detail.

VLSI processors for signal detection in SETI

NASA Technical Reports Server (NTRS)

Duluk, J. F.; Linscott, I. R.; Peterson, A. M.; Burr, J.; Ekroot, B.; Twicken, J.

1989-01-01

The objective of the Search for Extraterrestrial Intelligence (SETI) is to locate an artificially created signal coming from a distant star. This is done in two steps: (1) spectral analysis of an incoming radio frequency band, and (2) pattern detection for narrow-band signals. Both steps are computationally expensive and require the development of specially designed computer architectures. To reduce the size and cost of the SETI signal detection machine, two custom VLSI chips are under development. The first chip, the SETI DSP Engine, is used in the spectrum analyzer and is specially designed to compute Discrete Fourier Transforms (DFTs). It is a high-speed arithmetic processor that has two adders, one multiplier-accumulator, and three four-port memories. The second chip is a new type of Content-Addressable Memory. It is the heart of an associative processor that is used for pattern detection. Both chips incorporate many innovative circuits and architectural features.
VLSI processors for signal detection in SETI.

PubMed

Duluk, J F; Linscott, I R; Peterson, A M; Burr, J; Ekroot, B; Twicken, J

1989-01-01

The objective of the Search for Extraterrestrial Intelligence (SETI) is to locate an artificially created signal coming from a distant star. This is done in two steps: (1) spectral analysis of an incoming radio frequency band, and (2) pattern detection for narrow-band signals. Both steps are computationally expensive and require the development of specially designed computer architectures. To reduce the size and cost of the SETI signal detection machine, two custom VLSI chips are under development. The first chip, the SETI DSP Engine, is used in the spectrum analyzer and is specially designed to compute Discrete Fourier Transforms (DFTs). It is a high-speed arithmetic processor that has two adders, one multiplier-accumulator, and three four-port memories. The second chip is a new type of Content-Addressable Memory. It is the heart of an associative processor that is used for pattern detection. Both chips incorporate many innovative circuits and architectural features.
VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal

NASA Astrophysics Data System (ADS)

Satheeskumaran, S.; Sabrigiriraj, M.

2016-06-01

Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.
Techniques for Computing the DFT Using the Residue Fermat Number Systems and VLSI

NASA Technical Reports Server (NTRS)

Truong, T. K.; Chang, J. J.; Hsu, I. S.; Pei, D. Y.; Reed, I. S.

1985-01-01

The integer complex multiplier and adder over the direct sum of two copies of a finite field is specialized to the direct sum of the rings of integers modulo Fermat numbers. Such multiplications and additions can be used in the implementation of a discrete Fourier transform (DFT) of a sequence of complex numbers. The advantage of the present approach is that the number of multiplications needed for the DFT can be reduced substantially over the previous approach. The architectural designs using this approach are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.
A VLSI chip set for real time vector quantization of image sequences

NASA Technical Reports Server (NTRS)

Baker, Richard L.

1989-01-01

The architecture and implementation of a VLSI chip set that vector quantizes (VQ) image sequences in real time is described. The chip set forms a programmable Single-Instruction, Multiple-Data (SIMD) machine which can implement various vector quantization encoding structures. Its VQ codebook may contain unlimited number of codevectors, N, having dimension up to K = 64. Under a weighted least squared error criterion, the engine locates at video rates the best code vector in full-searched or large tree searched VQ codebooks. The ability to manipulate tree structured codebooks, coupled with parallelism and pipelining, permits searches in as short as O (log N) cycles. A full codebook search results in O(N) performance, compared to O(KN) for a Single-Instruction, Single-Data (SISD) machine. With this VLSI chip set, an entire video code can be built on a single board that permits realtime experimentation with very large codebooks.
Laboratory for Computer Science Progress Report 19, 1 July 1981-30 June 1982.

DTIC Science & Technology

1984-05-01

Multiprocessor Architectures 202 4. TRIX Operating System 209 5. VLSI Tools 212 ’SYSTEMATIC PROGRAM DEVELOPMENT, 221 1. Introduction 222 2. Specification...exploring distributed operating systems and the architecture of single-user powerful computers that are interconnected by communication networks. The...to now. In particular, we expect to experiment with languages, operating systems , and applications that establish the feasibility of distributed
Sequence invariant state machines

NASA Technical Reports Server (NTRS)

Whitaker, S.; Manjunath, S.

1990-01-01

A synthesis method and new VLSI architecture are introduced to realize sequential circuits that have the ability to implement any state machine having N states and m inputs, regardless of the actual sequence specified in the flow table. A design method is proposed that utilizes BTS logic to implement regular and dense circuits. A given state sequence can be programmed with power supply connections or dynamically reallocated if stored in a register. Arbitrary flow table sequences can be modified or programmed to dynamically alter the function of the machine. This allows VLSI controllers to be designed with the programmability of a general purpose processor but with the compact size and performance of dedicated logic.
Sequence-invariant state machines

NASA Technical Reports Server (NTRS)

Whitaker, Sterling R.; Manjunath, Shamanna K.; Maki, Gary K.

1991-01-01

A synthesis method and an MOS VLSI architecture are presented to realize sequential circuits that have the ability to implement any state machine having N states and m inputs, regardless of the actual sequence specified in the flow table. The design method utilizes binary tree structured (BTS) logic to implement regular and dense circuits. The desired state sequence can be hardwired with power supply connections or can be dynamically reallocated if stored in a register. This allows programmable VLSI controllers to be designed with a compact size and performance approaching that of dedicated logic. Results of ICV implementations are reported and an example sequence-invariant state machine is contrasted with implementations based on traditional methods.
Flexible feature-space-construction architecture and its VLSI implementation for multi-scale object detection

NASA Astrophysics Data System (ADS)

Luo, Aiwen; An, Fengwei; Zhang, Xiangyu; Chen, Lei; Huang, Zunkai; Jürgen Mattausch, Hans

2018-04-01

Feature extraction techniques are a cornerstone of object detection in computer-vision-based applications. The detection performance of vison-based detection systems is often degraded by, e.g., changes in the illumination intensity of the light source, foreground-background contrast variations or automatic gain control from the camera. In order to avoid such degradation effects, we present a block-based L1-norm-circuit architecture which is configurable for different image-cell sizes, cell-based feature descriptors and image resolutions according to customization parameters from the circuit input. The incorporated flexibility in both the image resolution and the cell size for multi-scale image pyramids leads to lower computational complexity and power consumption. Additionally, an object-detection prototype for performance evaluation in 65 nm CMOS implements the proposed L1-norm circuit together with a histogram of oriented gradients (HOG) descriptor and a support vector machine (SVM) classifier. The proposed parallel architecture with high hardware efficiency enables real-time processing, high detection robustness, small chip-core area as well as low power consumption for multi-scale object detection.
A procedural method for the efficient implementation of full-custom VLSI designs

NASA Technical Reports Server (NTRS)

Belk, P.; Hickey, N.

1987-01-01

An imbedded language system for the layout of very large scale integration (VLSI) circuits is examined. It is shown that through the judicious use of this system, a large variety of circuits can be designed with circuit density and performance comparable to traditional full-custom design methods, but with design costs more comparable to semi-custom design methods. The high performance of this methodology is attributable to the flexibility of procedural descriptions of VLSI layouts and to a number of automatic and semi-automatic tools within the system.
Compact VLSI neural computer integrated with active pixel sensor for real-time ATR applications

NASA Astrophysics Data System (ADS)

Fang, Wai-Chi; Udomkesmalee, Gabriel; Alkalai, Leon

1997-04-01

A compact VLSI neural computer integrated with an active pixel sensor has been under development to mimic what is inherent in biological vision systems. This electronic eye- brain computer is targeted for real-time machine vision applications which require both high-bandwidth communication and high-performance computing for data sensing, synergy of multiple types of sensory information, feature extraction, target detection, target recognition, and control functions. The neural computer is based on a composite structure which combines Annealing Cellular Neural Network (ACNN) and Hierarchical Self-Organization Neural Network (HSONN). The ACNN architecture is a programmable and scalable multi- dimensional array of annealing neurons which are locally connected with their local neurons. Meanwhile, the HSONN adopts a hierarchical structure with nonlinear basis functions. The ACNN+HSONN neural computer is effectively designed to perform programmable functions for machine vision processing in all levels with its embedded host processor. It provides a two order-of-magnitude increase in computation power over the state-of-the-art microcomputer and DSP microelectronics. A compact current-mode VLSI design feasibility of the ACNN+HSONN neural computer is demonstrated by a 3D 16X8X9-cube neural processor chip design in a 2-micrometers CMOS technology. Integration of this neural computer as one slice of a 4'X4' multichip module into the 3D MCM based avionics architecture for NASA's New Millennium Program is also described.
Hardware architecture for projective model calculation and false match refining using random sample consensus algorithm

NASA Astrophysics Data System (ADS)

Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid

2016-11-01

The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.
Bit-parallel arithmetic in a massively-parallel associative processor

NASA Technical Reports Server (NTRS)

Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

1992-01-01

A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Parallel processing for digital picture comparison

NASA Technical Reports Server (NTRS)

Cheng, H. D.; Kou, L. T.

1987-01-01

In picture processing an important problem is to identify two digital pictures of the same scene taken under different lighting conditions. This kind of problem can be found in remote sensing, satellite signal processing and the related areas. The identification can be done by transforming the gray levels so that the gray level histograms of the two pictures are closely matched. The transformation problem can be solved by using the packing method. Researchers propose a VLSI architecture consisting of m x n processing elements with extensive parallel and pipelining computation capabilities to speed up the transformation with the time complexity 0(max(m,n)), where m and n are the numbers of the gray levels of the input picture and the reference picture respectively. If using uniprocessor and a dynamic programming algorithm, the time complexity will be 0(m(3)xn). The algorithm partition problem, as an important issue in VLSI design, is discussed. Verification of the proposed architecture is also given.
Bio-Inspired Microsystem for Robust Genetic Assay Recognition

PubMed Central

Lue, Jaw-Chyng; Fang, Wai-Chi

2008-01-01

A compact integrated system-on-chip (SoC) architecture solution for robust, real-time, and on-site genetic analysis has been proposed. This microsystem solution is noise-tolerable and suitable for analyzing the weak fluorescence patterns from a PCR prepared dual-labeled DNA microchip assay. In the architecture, a preceding VLSI differential logarithm microchip is designed for effectively computing the logarithm of the normalized input fluorescence signals. A posterior VLSI artificial neural network (ANN) processor chip is used for analyzing the processed signals from the differential logarithm stage. A single-channel logarithmic circuit was fabricated and characterized. A prototype ANN chip with unsupervised winner-take-all (WTA) function was designed, fabricated, and tested. An ANN learning algorithm using a novel sigmoid-logarithmic transfer function based on the supervised backpropagation (BP) algorithm is proposed for robustly recognizing low-intensity patterns. Our results show that the trained new ANN can recognize low-fluorescence patterns better than an ANN using the conventional sigmoid function. PMID:18566679
WARP: Weight Associative Rule Processor. A dedicated VLSI fuzzy logic megacell

NASA Technical Reports Server (NTRS)

Pagni, A.; Poluzzi, R.; Rizzotto, G. G.

1992-01-01

During the last five years Fuzzy Logic has gained enormous popularity in the academic and industrial worlds. The success of this new methodology has led the microelectronics industry to create a new class of machines, called Fuzzy Machines, to overcome the limitations of traditional computing systems when utilized as Fuzzy Systems. This paper gives an overview of the methods by which Fuzzy Logic data structures are represented in the machines (each with its own advantages and inefficiencies). Next, the paper introduces WARP (Weight Associative Rule Processor) which is a dedicated VLSI megacell allowing the realization of a fuzzy controller suitable for a wide range of applications. WARP represents an innovative approach to VLSI Fuzzy controllers by utilizing different types of data structures for characterizing the membership functions during the various stages of the Fuzzy processing. WARP dedicated architecture has been designed in order to achieve high performance by exploiting the computational advantages offered by the different data representations.
VLSI Design of SVM-Based Seizure Detection System With On-Chip Learning Capability.

PubMed

Feng, Lichen; Li, Zunchao; Wang, Yuanfa

2018-02-01

Portable automatic seizure detection system is very convenient for epilepsy patients to carry. In order to make the system on-chip trainable with high efficiency and attain high detection accuracy, this paper presents a very large scale integration (VLSI) design based on the nonlinear support vector machine (SVM). The proposed design mainly consists of a feature extraction (FE) module and an SVM module. The FE module performs the three-level Daubechies discrete wavelet transform to fit the physiological bands of the electroencephalogram (EEG) signal and extracts the time-frequency domain features reflecting the nonstationary signal properties. The SVM module integrates the modified sequential minimal optimization algorithm with the table-driven-based Gaussian kernel to enable efficient on-chip learning. The presented design is verified on an Altera Cyclone II field-programmable gate array and tested using the two publicly available EEG datasets. Experiment results show that the designed VLSI system improves the detection accuracy and training efficiency.
Grain-size considerations for optoelectronic multistage interconnection networks.

PubMed

Krishnamoorthy, A V; Marchand, P J; Kiamilev, F E; Esener, S C

1992-09-10

This paper investigates, at the system level, the performance-cost trade-off between optical and electronic interconnects in an optoelectronic interconnection network. The specific system considered is a packet-switched, free-space optoelectronic shuffle-exchange multistage interconnection network (MIN). System bandwidth is used as the performance measure, while system area, system power, and system volume constitute the cost measures. A detailed design and analysis of a two-dimensional (2-D) optoelectronic shuffle-exchange routing network with variable grain size K is presented. The architecture permits the conventional 2 x 2 switches or grains to be generalized to larger K x K grain sizes by replacing optical interconnects with electronic wires without affecting the functionality of the system. Thus the system consists of log(k) N optoelectronic stages interconnected with free-space K-shuffles. When K = N, the MIN consists of a single electronic stage with optical input-output. The system design use an effi ient 2-D VLSI layout and a single diffractive optical element between stages to provide the 2-D K-shuffle interconnection. Results indicate that there is an optimum range of grain sizes that provides the best performance per cost. For the specific VLSI/GaAs multiple quantum well technology and system architecture considered, grain sizes larger than 256 x 256 result in a reduced performance, while grain sizes smaller than 16 x 16 have a high cost. For a network with 4096 channels, the useful range of grain sizes corresponds to approximately 250-400 electronic transistors per optical input-output channel. The effect of varying certain technology parameters such as the number of hologram phase levels, the modulator driving voltage, the minimum detectable power, and VLSI minimum feature size on the optimum grain-size system is studied. For instance, results show that using four phase levels for the interconnection hologram is a good compromise for the cost functions mentioned above. As VLSI minimum feature sizes decrease, the optimum grain size increases, whereas, if optical interconnect performance in terms of the detector power or modulator driving voltage requirements improves, the optimum grain size may be reduced. Finally, several architectural modifications to the system, such as K x K contention-free switches and sorting networks, are investigated and optimized for grain size. Results indicate that system bandwidth can be increased, but at the price of reduced performance/cost. The optoelectronic MIN architectures considered thus provide a broad range of performance/cost alternatives and offer a superior performance over purely electronic MIN's.
Cascaded VLSI Chips Help Neural Network To Learn

NASA Technical Reports Server (NTRS)

Duong, Tuan A.; Daud, Taher; Thakoor, Anilkumar P.

1993-01-01

Cascading provides 12-bit resolution needed for learning. Using conventional silicon chip fabrication technology of VLSI, fully connected architecture consisting of 32 wide-range, variable gain, sigmoidal neurons along one diagonal and 7-bit resolution, electrically programmable, synaptic 32 x 31 weight matrix implemented on neuron-synapse chip. To increase weight nominally from 7 to 13 bits, synapses on chip individually cascaded with respective synapses on another 32 x 32 matrix chip with 7-bit resolution synapses only (without neurons). Cascade correlation algorithm varies number of layers effectively connected into network; adds hidden layers one at a time during learning process in such way as to optimize overall number of neurons and complexity and configuration of network.
A VLSI VAX chip set

NASA Astrophysics Data System (ADS)

Johnson, W. N.; Herrick, W. V.; Grundmann, W. J.

1984-10-01

For the first time, VLSI technology is used to compress the full functinality and comparable performance of the VAX 11/780 super-minicomputer into a 1.2 M transistor microprocessor chip set. There was no subsetting of the 304 instruction set and the 17 data types, nor reduction in hardware support for the 4 Gbyte virtual memory management architecture. The chipset supports an integral 8 kbyte memory cache, a 13.3 Mbyte/s system bus, and sophisticated multiprocessing. High performance is achieved through microcode optimizations afforded by the large control store, tightly coupled address and data caches, the use of internal and external 32 bit datapaths, the extensive aplication of both microlevel and macrolevel pipelining, and the use of specialized hardware assists.

A neural net based architecture for the segmentation of mixed gray-level and binary pictures

NASA Technical Reports Server (NTRS)

Tabatabai, Ali; Troudet, Terry P.

1991-01-01

A neural-net-based architecture is proposed to perform segmentation in real time for mixed gray-level and binary pictures. In this approach, the composite picture is divided into 16 x 16 pixel blocks, which are identified as character blocks or image blocks on the basis of a dichotomy measure computed by an adaptive 16 x 16 neural net. For compression purposes, each image block is further divided into 4 x 4 subblocks; a one-bit nonparametric quantizer is used to encode 16 x 16 character and 4 x 4 image blocks; and the binary map and quantizer levels are obtained through a neural net segmentor over each block. The efficiency of the neural segmentation in terms of computational speed, data compression, and quality of the compressed picture is demonstrated. The effect of weight quantization is also discussed. VLSI implementations of such adaptive neural nets in CMOS technology are described and simulated in real time for a maximum block size of 256 pixels.
Fuzzy control of magnetic bearings

NASA Technical Reports Server (NTRS)

Feeley, J. J.; Niederauer, G. M.; Ahlstrom, D. J.

1991-01-01

The use of an adaptive fuzzy control algorithm implemented on a VLSI chip for the control of a magnetic bearing was considered. The architecture of the adaptive fuzzy controller is similar to that of a neural network. The performance of the fuzzy controller is compared to that of a conventional controller by computer simulation.
Advanced flight computer. Special study

NASA Technical Reports Server (NTRS)

Coo, Dennis

1995-01-01

This report documents a special study to define a 32-bit radiation hardened, SEU tolerant flight computer architecture, and to investigate current or near-term technologies and development efforts that contribute to the Advanced Flight Computer (AFC) design and development. An AFC processing node architecture is defined. Each node may consist of a multi-chip processor as needed. The modular, building block approach uses VLSI technology and packaging methods that demonstrate a feasible AFC module in 1998 that meets that AFC goals. The defined architecture and approach demonstrate a clear low-risk, low-cost path to the 1998 production goal, with intermediate prototypes in 1996.
Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

NASA Astrophysics Data System (ADS)

Lee, J.; Kim, K.

A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.
Direct kinematics solution architectures for industrial robot manipulators: Bit-serial versus parallel

NASA Technical Reports Server (NTRS)

Lee, J.; Kim, K.

1991-01-01

A Very Large Scale Integration (VLSI) architecture for robot direct kinematic computation suitable for industrial robot manipulators was investigated. The Denavit-Hartenberg transformations are reviewed to exploit a proper processing element, namely an augmented CORDIC. Specifically, two distinct implementations are elaborated on, such as the bit-serial and parallel. Performance of each scheme is analyzed with respect to the time to compute one location of the end-effector of a 6-links manipulator, and the number of transistors required.
Exploration and Evaluation of Nanometer Low-power Multi-core VLSI Computer Architectures

DTIC Science & Technology

2015-03-01

ICC, the Milkway database was created using the command: milkyway –galaxy –nogui –tcl –log memory.log one.tcl As stated previously, it is...EDA tools. Typically, Synopsys® tools use Milkway databases, whereas, Cadence Design System® use Layout Exchange Format (LEF) formats. To help
Real time on-chip sequential adaptive principal component analysis for data feature extraction and image compression

NASA Technical Reports Server (NTRS)

Duong, T. A.

2004-01-01

In this paper, we present a new, simple, and optimized hardware architecture sequential learning technique for adaptive Principle Component Analysis (PCA) which will help optimize the hardware implementation in VLSI and to overcome the difficulties of the traditional gradient descent in learning convergence and hardware implementation.
Prototype architecture for a VLSI level zero processing system. [Space Station Freedom

NASA Technical Reports Server (NTRS)

Shi, Jianfei; Grebowsky, Gerald J.; Horner, Ward P.; Chesney, James R.

1989-01-01

The prototype architecture and implementation of a high-speed level zero processing (LZP) system are discussed. Due to the new processing algorithm and VLSI technology, the prototype LZP system features compact size, low cost, high processing throughput, and easy maintainability and increased reliability. Though extensive control functions have been done by hardware, the programmability of processing tasks makes it possible to adapt the system to different data formats and processing requirements. It is noted that the LZP system can handle up to 8 virtual channels and 24 sources with combined data volume of 15 Gbytes per orbit. For greater demands, multiple LZP systems can be configured in parallel, each called a processing channel and assigned a subset of virtual channels. The telemetry data stream will be steered into different processing channels in accordance with their virtual channel IDs. This super system can cope with a virtually unlimited number of virtual channels and sources. In the near future, it is expected that new disk farms with data rate exceeding 150 Mbps will be available from commercial vendors due to the advance in disk drive technology.
VLSI design of an RSA encryption/decryption chip using systolic array based architecture

NASA Astrophysics Data System (ADS)

Sun, Chi-Chia; Lin, Bor-Shing; Jan, Gene Eu; Lin, Jheng-Yi

2016-09-01

This article presents the VLSI design of a configurable RSA public key cryptosystem supporting the 512-bit, 1024-bit and 2048-bit based on Montgomery algorithm achieving comparable clock cycles of current relevant works but with smaller die size. We use binary method for the modular exponentiation and adopt Montgomery algorithm for the modular multiplication to simplify computational complexity, which, together with the systolic array concept for electric circuit designs effectively, lower the die size. The main architecture of the chip consists of four functional blocks, namely input/output modules, registers module, arithmetic module and control module. We applied the concept of systolic array to design the RSA encryption/decryption chip by using VHDL hardware language and verified using the TSMC/CIC 0.35 m 1P4 M technology. The die area of the 2048-bit RSA chip without the DFT is 3.9 × 3.9 mm2 (4.58 × 4.58 mm2 with DFT). Its average baud rate can reach 10.84 kbps under a 100 MHz clock.
Orientation-selective aVLSI spiking neurons.

PubMed

Liu, S C; Kramer, J; Indiveri, G; Delbrück, T; Burg, T; Douglas, R

2001-01-01

We describe a programmable multi-chip VLSI neuronal system that can be used for exploring spike-based information processing models. The system consists of a silicon retina, a PIC microcontroller, and a transceiver chip whose integrate-and-fire neurons are connected in a soft winner-take-all architecture. The circuit on this multi-neuron chip approximates a cortical microcircuit. The neurons can be configured for different computational properties by the virtual connections of a selected set of pixels on the silicon retina. The virtual wiring between the different chips is effected by an event-driven communication protocol that uses asynchronous digital pulses, similar to spikes in a neuronal system. We used the multi-chip spike-based system to synthesize orientation-tuned neurons using both a feedforward model and a feedback model. The performance of our analog hardware spiking model matched the experimental observations and digital simulations of continuous-valued neurons. The multi-chip VLSI system has advantages over computer neuronal models in that it is real-time, and the computational time does not scale with the size of the neuronal network.
Systolic array processing of the sequential decoding algorithm

NASA Technical Reports Server (NTRS)

Chang, C. Y.; Yao, K.

1989-01-01

A systolic array processing technique is applied to implementing the stack algorithm form of the sequential decoding algorithm. It is shown that sorting, a key function in the stack algorithm, can be efficiently realized by a special type of systolic arrays known as systolic priority queues. Compared to the stack-bucket algorithm, this approach is shown to have the advantages that the decoding always moves along the optimal path, that it has a fast and constant decoding speed and that its simple and regular hardware architecture is suitable for VLSI implementation. Three types of systolic priority queues are discussed: random access scheme, shift register scheme and ripple register scheme. The property of the entries stored in the systolic priority queue is also investigated. The results are applicable to many other basic sorting type problems.
A high performance parallel computing architecture for robust image features

NASA Astrophysics Data System (ADS)

Zhou, Renyan; Liu, Leibo; Wei, Shaojun

2014-03-01

A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.
Low power signal processing research at Stanford

NASA Technical Reports Server (NTRS)

Burr, J.; Williamson, P. R.; Peterson, A.

1991-01-01

This paper gives an overview of the research being conducted at Stanford University's Space, Telecommunications, and Radioscience Laboratory in the area of low energy computation. It discusses the work we are doing in large scale digital VLSI neural networks, interleaved processor and pipelined memory architectures, energy estimation and optimization, multichip module packaging, and low voltage digital logic.
Emergent Auditory Feature Tuning in a Real-Time Neuromorphic VLSI System.

PubMed

Sheik, Sadique; Coath, Martin; Indiveri, Giacomo; Denham, Susan L; Wennekers, Thomas; Chicca, Elisabetta

2012-01-01

Many sounds of ecological importance, such as communication calls, are characterized by time-varying spectra. However, most neuromorphic auditory models to date have focused on distinguishing mainly static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. In contrast, the emergence of dynamic feature sensitivity through exposure to formative stimuli has been recently modeled in a network of spiking neurons based on the thalamo-cortical architecture. The proposed network models the effect of lateral and recurrent connections between cortical layers, distance-dependent axonal transmission delays, and learning in the form of Spike Timing Dependent Plasticity (STDP), which effects stimulus-driven changes in the pattern of network connectivity. In this paper we demonstrate how these principles can be efficiently implemented in neuromorphic hardware. In doing so we address two principle problems in the design of neuromorphic systems: real-time event-based asynchronous communication in multi-chip systems, and the realization in hybrid analog/digital VLSI technology of neural computational principles that we propose underlie plasticity in neural processing of dynamic stimuli. The result is a hardware neural network that learns in real-time and shows preferential responses, after exposure, to stimuli exhibiting particular spectro-temporal patterns. The availability of hardware on which the model can be implemented, makes this a significant step toward the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems.
Emergent Auditory Feature Tuning in a Real-Time Neuromorphic VLSI System

PubMed Central

Sheik, Sadique; Coath, Martin; Indiveri, Giacomo; Denham, Susan L.; Wennekers, Thomas; Chicca, Elisabetta

2011-01-01

Many sounds of ecological importance, such as communication calls, are characterized by time-varying spectra. However, most neuromorphic auditory models to date have focused on distinguishing mainly static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. In contrast, the emergence of dynamic feature sensitivity through exposure to formative stimuli has been recently modeled in a network of spiking neurons based on the thalamo-cortical architecture. The proposed network models the effect of lateral and recurrent connections between cortical layers, distance-dependent axonal transmission delays, and learning in the form of Spike Timing Dependent Plasticity (STDP), which effects stimulus-driven changes in the pattern of network connectivity. In this paper we demonstrate how these principles can be efficiently implemented in neuromorphic hardware. In doing so we address two principle problems in the design of neuromorphic systems: real-time event-based asynchronous communication in multi-chip systems, and the realization in hybrid analog/digital VLSI technology of neural computational principles that we propose underlie plasticity in neural processing of dynamic stimuli. The result is a hardware neural network that learns in real-time and shows preferential responses, after exposure, to stimuli exhibiting particular spectro-temporal patterns. The availability of hardware on which the model can be implemented, makes this a significant step toward the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems. PMID:22347163
A Streaming PCA VLSI Chip for Neural Data Compression.

PubMed

Wu, Tong; Zhao, Wenfeng; Guo, Hongsun; Lim, Hubert H; Yang, Zhi

2017-12-01

Neural recording system miniaturization and integration with low-power wireless technologies require compressing neural data before transmission. Feature extraction is a procedure to represent data in a low-dimensional space; its integration into a recording chip can be an efficient approach to compress neural data. In this paper, we propose a streaming principal component analysis algorithm and its microchip implementation to compress multichannel local field potential (LFP) and spike data. The circuits have been designed in a 65-nm CMOS technology and occupy a silicon area of 0.06 mm. Throughout the experiments, the chip compresses LFPs by 10 at the expense of as low as 1% reconstruction errors and 144-nW/channel power consumption; for spikes, the achieved compression ratio is 25 with 8% reconstruction errors and 3.05-W/channel power consumption. In addition, the algorithm and its hardware architecture can swiftly adapt to nonstationary spiking activities, which enables efficient hardware sharing among multiple channels to support a high-channel count recorder.
On-Chip Neural Data Compression Based On Compressed Sensing With Sparse Sensing Matrices.

PubMed

Zhao, Wenfeng; Sun, Biao; Wu, Tong; Yang, Zhi

2018-02-01

On-chip neural data compression is an enabling technique for wireless neural interfaces that suffer from insufficient bandwidth and power budgets to transmit the raw data. The data compression algorithm and its implementation should be power and area efficient and functionally reliable over different datasets. Compressed sensing is an emerging technique that has been applied to compress various neurophysiological data. However, the state-of-the-art compressed sensing (CS) encoders leverage random but dense binary measurement matrices, which incur substantial implementation costs on both power and area that could offset the benefits from the reduced wireless data rate. In this paper, we propose two CS encoder designs based on sparse measurement matrices that could lead to efficient hardware implementation. Specifically, two different approaches for the construction of sparse measurement matrices, i.e., the deterministic quasi-cyclic array code (QCAC) matrix and -sparse random binary matrix [-SRBM] are exploited. We demonstrate that the proposed CS encoders lead to comparable recovery performance. And efficient VLSI architecture designs are proposed for QCAC-CS and -SRBM encoders with reduced area and total power consumption.
An efficient ASIC implementation of 16-channel on-line recursive ICA processor for real-time EEG system.

PubMed

Fang, Wai-Chi; Huang, Kuan-Ju; Chou, Chia-Ching; Chang, Jui-Chung; Cauwenberghs, Gert; Jung, Tzyy-Ping

2014-01-01

This is a proposal for an efficient very-large-scale integration (VLSI) design, 16-channel on-line recursive independent component analysis (ORICA) processor ASIC for real-time EEG system, implemented with TSMC 40 nm CMOS technology. ORICA is appropriate to be used in real-time EEG system to separate artifacts because of its highly efficient and real-time process features. The proposed ORICA processor is composed of an ORICA processing unit and a singular value decomposition (SVD) processing unit. Compared with previous work [1], this proposed ORICA processor has enhanced effectiveness and reduced hardware complexity by utilizing a deeper pipeline architecture, shared arithmetic processing unit, and shared registers. The 16-channel random signals which contain 8-channel super-Gaussian and 8-channel sub-Gaussian components are used to analyze the dependence of the source components, and the average correlation coefficient is 0.95452 between the original source signals and extracted ORICA signals. Finally, the proposed ORICA processor ASIC is implemented with TSMC 40 nm CMOS technology, and it consumes 15.72 mW at 100 MHz operating frequency.
A pipeline VLSI design of fast singular value decomposition processor for real-time EEG system based on on-line recursive independent component analysis.

PubMed

Huang, Kuan-Ju; Shih, Wei-Yeh; Chang, Jui Chung; Feng, Chih Wei; Fang, Wai-Chi

2013-01-01

This paper presents a pipeline VLSI design of fast singular value decomposition (SVD) processor for real-time electroencephalography (EEG) system based on on-line recursive independent component analysis (ORICA). Since SVD is used frequently in computations of the real-time EEG system, a low-latency and high-accuracy SVD processor is essential. During the EEG system process, the proposed SVD processor aims to solve the diagonal, inverse and inverse square root matrices of the target matrices in real time. Generally, SVD requires a huge amount of computation in hardware implementation. Therefore, this work proposes a novel design concept for data flow updating to assist the pipeline VLSI implementation. The SVD processor can greatly improve the feasibility of real-time EEG system applications such as brain computer interfaces (BCIs). The proposed architecture is implemented using TSMC 90 nm CMOS technology. The sample rate of EEG raw data adopts 128 Hz. The core size of the SVD processor is 580×580 um(2), and the speed of operation frequency is 20MHz. It consumes 0.774mW of power during the 8-channel EEG system per execution time.
Electronic neural network for dynamic resource allocation

NASA Technical Reports Server (NTRS)

Thakoor, A. P.; Eberhardt, S. P.; Daud, T.

1991-01-01

A VLSI implementable neural network architecture for dynamic assignment is presented. The resource allocation problems involve assigning members of one set (e.g. resources) to those of another (e.g. consumers) such that the global 'cost' of the associations is minimized. The network consists of a matrix of sigmoidal processing elements (neurons), where the rows of the matrix represent resources and columns represent consumers. Unlike previous neural implementations, however, association costs are applied directly to the neurons, reducing connectivity of the network to VLSI-compatible 0 (number of neurons). Each row (and column) has an additional neuron associated with it to independently oversee activations of all the neurons in each row (and each column), providing a programmable 'k-winner-take-all' function. This function simultaneously enforces blocking (excitatory/inhibitory) constraints during convergence to control the number of active elements in each row and column within desired boundary conditions. Simulations show that the network, when implemented in fully parallel VLSI hardware, offers optimal (or near-optimal) solutions within only a fraction of a millisecond, for problems up to 128 resources and 128 consumers, orders of magnitude faster than conventional computing or heuristic search methods.

Implementation of Multi-Agent Object Attention System Based on Biologically Inspired Attractor Selection

NASA Astrophysics Data System (ADS)

Hashimoto, Ryoji; Matsumura, Tomoya; Nozato, Yoshihiro; Watanabe, Kenji; Onoye, Takao

A multi-agent object attention system is proposed, which is based on biologically inspired attractor selection model. Object attention is facilitated by using a video sequence and a depth map obtained through a compound-eye image sensor TOMBO. Robustness of the multi-agent system over environmental changes is enhanced by utilizing the biological model of adaptive response by attractor selection. To implement the proposed system, an efficient VLSI architecture is employed with reducing enormous computational costs and memory accesses required for depth map processing and multi-agent attractor selection process. According to the FPGA implementation result of the proposed object attention system, which is accomplished by using 7,063 slices, 640×512 pixel input images can be processed in real-time with three agents at a rate of 9fps in 48MHz operation.
Fast neural solution of a nonlinear wave equation

NASA Technical Reports Server (NTRS)

Toomarian, Nikzad; Barhen, Jacob

1992-01-01

A neural algorithm for rapidly simulating a certain class of nonlinear wave phenomena using analog VLSI neural hardware is presented and applied to the Korteweg-de Vries partial differential equation. The corresponding neural architecture is obtained from a pseudospectral representation of the spatial dependence, along with a leap-frog scheme for the temporal evolution. Numerical simulations demonstrated the robustness of the proposed approach.
A High Performance VLSI Computer Architecture For Computer Graphics

NASA Astrophysics Data System (ADS)

Chin, Chi-Yuan; Lin, Wen-Tai

1988-10-01

A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
Reconfigurable tree architectures using subtree oriented fault tolerance

NASA Technical Reports Server (NTRS)

Lowrie, Matthew B.

1987-01-01

An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
The relationship between an advanced avionic system architecture and the elimination of the need for an Avionics Intermediate Shop (AIS)

NASA Astrophysics Data System (ADS)

Abraham, S. J.

While Avionics Intermediate Shops (AISs) have in the past been required for military aircraft, the emerging VLSI/VHSIC technology has given rise to the possibility of novel, well partitioned avionics system architectures that obviate the high spare parts costs that formerly prompted and justified the existence of an AIS. Future avionics may therefore be adequately and economically supported by a two-level maintenance system. Algebraic generalizations are presented for the analysis of the spares costs implications of alternative design partitioning schemes for future avionics.
A class of least-squares filtering and identification algorithms with systolic array architectures

NASA Technical Reports Server (NTRS)

Kalson, Seth Z.; Yao, Kung

1991-01-01

A unified approach is presented for deriving a large class of new and previously known time- and order-recursive least-squares algorithms with systolic array architectures, suitable for high-throughput-rate and VLSI implementations of space-time filtering and system identification problems. The geometrical derivation given is unique in that no assumption is made concerning the rank of the sample data correlation matrix. This method utilizes and extends the concept of oblique projections, as used previously in the derivations of the least-squares lattice algorithms. Exponentially weighted least-squares criteria are considered for both sliding and growing memory.
Image and Video Compression with VLSI Neural Networks

NASA Technical Reports Server (NTRS)

Fang, W.; Sheu, B.

1993-01-01

An advanced motion-compensated predictive video compression system based on artificial neural networks has been developed to effectively eliminate the temporal and spatial redundancy of video image sequences and thus reduce the bandwidth and storage required for the transmission and recording of the video signal. The VLSI neuroprocessor for high-speed high-ratio image compression based upon a self-organization network and the conventional algorithm for vector quantization are compared. The proposed method is quite efficient and can achieve near-optimal results.
Adaptive Optoelectronic Eyes: Hybrid Sensor/Processor Architectures

DTIC Science & Technology

2006-11-13

corresponding calculated data. The width of the mirror stopband is proportional to the refractive index difference between the high and low index materials ...Silicon VLSI Neuron Unit Arrays 56 Development of a Single-Sided Flip-Chip Bonding Process 65 Development of High Refractive Index Diffractive Optical ...Elements (DOEs) 68 Development of High-Performance Antireflection Coatings for High Refractive Index DOEs 69 Design and Fabrication of Low Threshold
Temporal coding in a silicon network of integrate-and-fire neurons.

PubMed

Liu, Shih-Chii; Douglas, Rodney

2004-09-01

Spatio-temporal processing of spike trains by neuronal networks depends on a variety of mechanisms distributed across synapses, dendrites, and somata. In natural systems, the spike trains and the processing mechanisms cohere though their common physical instantiation. This coherence is lost when the natural system is encoded for simulation on a general purpose computer. By contrast, analog VLSI circuits are, like neurons, inherently related by their real-time physics, and so, could provide a useful substrate for exploring neuronlike event-based processing. Here, we describe a hybrid analog-digital VLSI chip comprising a set of integrate-and-fire neurons and short-term dynamical synapses that can be configured into simple network architectures with some properties of neocortical neuronal circuits. We show that, despite considerable fabrication variance in the properties of individual neurons, the chip offers a viable substrate for exploring real-time spike-based processing in networks of neurons.
Simulation of a spiking neuron circuit using carbon nanotube transistors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Najari, Montassar, E-mail: malnjar@jazanu.edu.sa; IKCE unit, Jazan University, Jazan; El-Grour, Tarek, E-mail: grour-tarek@hotmail.fr

2016-06-10

Neuromorphic engineering is related to the existing analogies between the physical semiconductor VLSI (Very Large Scale Integration) and biophysics. Neuromorphic systems propose to reproduce the structure and function of biological neural systems for transferring their calculation capacity on silicon. Since the innovative research of Carver Mead, the neuromorphic engineering continues to emerge remarkable implementation of biological system. This work presents a simulation of an elementary neuron cell with a carbon nanotube transistor (CNTFET) based technology. The model of the cell neuron which was simulated is called integrate and fire (I&F) model firstly introduced by G. Indiveri in 2009. This circuitmore » has been simulated with CNTFET technology using ADS environment to verify the neuromorphic activities in terms of membrane potential. This work has demonstrated the efficiency of this emergent device; i.e CNTFET on the design of such architecture in terms of power consumption and technology integration density.« less
SAR processing on the MPP

NASA Technical Reports Server (NTRS)

Batcher, K. E.; Eddey, E. E.; Faiss, R. O.; Gilmore, P. A.

1981-01-01

The processing of synthetic aperture radar (SAR) signals using the massively parallel processor (MPP) is discussed. The fast Fourier transform convolution procedures employed in the algorithms are described. The MPP architecture comprises an array unit (ARU) which processes arrays of data; an array control unit which controls the operation of the ARU and performs scalar arithmetic; a program and data management unit which controls the flow of data; and a unique staging memory (SM) which buffers and permutes data. The ARU contains a 128 by 128 array of bit-serial processing elements (PE). Two-by-four surarrays of PE's are packaged in a custom VLSI HCMOS chip. The staging memory is a large multidimensional-access memory which buffers and permutes data flowing with the system. Efficient SAR processing is achieved via ARU communication paths and SM data manipulation. Real time processing capability can be realized via a multiple ARU, multiple SM configuration.
A multichip aVLSI system emulating orientation selectivity of primary visual cortical cells.

PubMed

Shimonomura, Kazuhiro; Yagi, Tetsuya

2005-07-01

In this paper, we designed and fabricated a multichip neuromorphic analog very large scale integrated (aVLSI) system, which emulates the orientation selective response of the simple cell in the primary visual cortex. The system consists of a silicon retina and an orientation chip. An image, which is filtered by a concentric center-surround (CS) antagonistic receptive field of the silicon retina, is transferred to the orientation chip. The image transfer from the silicon retina to the orientation chip is carried out with analog signals. The orientation chip selectively aggregates multiple pixels of the silicon retina, mimicking the feedforward model proposed by Hubel and Wiesel. The chip provides the orientation-selective (OS) outputs which are tuned to 0 degrees, 60 degrees, and 120 degrees. The feed-forward aggregation reduces the fixed pattern noise that is due to the mismatch of the transistors in the orientation chip. The spatial properties of the orientation selective response were examined in terms of the adjustable parameters of the chip, i.e., the number of aggregated pixels and size of the receptive field of the silicon retina. The multichip aVLSI architecture used in the present study can be applied to implement higher order cells such as the complex cell of the primary visual cortex.
Access-in-turn test architecture for low-power test application

NASA Astrophysics Data System (ADS)

Wang, Weizheng; Wang, JinCheng; Wang, Zengyun; Xiang, Lingyun

2017-03-01

This paper presents a novel access-in-turn test architecture (AIT-TA) for testing of very large scale integrated (VLSI) designs. In the proposed scheme, each scan cell in a chain receives test data from shift-in line in turn while pushing its test response to the shift-out line. It solves the power problem of conventional scan architecture to a great extent and suppresses significantly the switching activity during shift and capture operation with acceptable hardware overhead. Thus, it can help to implement the test at much higher operation frequencies resulting shorter test application time. The proposed test approach enhances the architecture of conventional scan flip-flops and backward compatible with existing test pattern generation and simulation techniques. Experimental results obtained for some larger ISCAS'89 and ITC'99 benchmark circuits illustrate effectiveness of the proposed low-power test application scheme.
VLSI Architectures and CAD

DTIC Science & Technology

1989-11-01

considerable promise is a variation of the familiar Lempel - Ziv adaptive data compression scheme that permits a straightforward mapping to hardware...types of data . The UNIX " compress " implementation is based upon Terry Welch’s 1984 variation of the Lempel - Ziv method (LZW). One flaw lies in the fact...or more; it must effec- tively compress all types of data (i.e. the algorithm must be universal); the implementation must be contained within a small
Mapping of H.264 decoding on a multiprocessor architecture

NASA Astrophysics Data System (ADS)

van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

2003-05-01

Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the overall speedup.
Integrated optical circuits for numerical computation

NASA Technical Reports Server (NTRS)

Verber, C. M.; Kenan, R. P.

1983-01-01

The development of integrated optical circuits (IOC) for numerical-computation applications is reviewed, with a focus on the use of systolic architectures. The basic architecture criteria for optical processors are shown to be the same as those proposed by Kung (1982) for VLSI design, and the advantages of IOCs over bulk techniques are indicated. The operation and fabrication of electrooptic grating structures are outlined, and the application of IOCs of this type to an existing 32-bit, 32-Mbit/sec digital correlator, a proposed matrix multiplier, and a proposed pipeline processor for polynomial evaluation is discussed. The problems arising from the inherent nonlinearity of electrooptic gratings are considered. Diagrams and drawings of the application concepts are provided.
VLSI architecture for a Reed-Solomon decoder

NASA Technical Reports Server (NTRS)

Hsu, In-Shek (Inventor); Truong, Trieu-Kie (Inventor)

1992-01-01

A basic single-chip building block for a Reed-Solomon (RS) decoder system is partitioned into a plurality of sections, the first of which consists of a plurality of syndrome subcells each of which contains identical standard-basis finite-field multipliers that are programmable between 10 and 8 bit operation. A desired number of basic building blocks may be assembled to provide a RS decoder of any syndrome subcell size that is programmable between 10 and 8 bit operation.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Thakoor, Anilkumar P. (Inventor); Duong, Tuan A. (Inventor); Daud, Taher (Inventor)

1992-01-01

High-speed, analog, fully-parallel, and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A computation intensive feature classification application was demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as an application specific coprocessor for solving real world problems at extremely high data rates.
Cascaded VLSI neural network architecture for on-line learning

NASA Technical Reports Server (NTRS)

Duong, Tuan A. (Inventor); Daud, Taher (Inventor); Thakoor, Anilkumar P. (Inventor)

1995-01-01

High-speed, analog, fully-parallel and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware-compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A comparison-intensive feature classification application has been demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as application-specific-coprocessors for solving real-world problems at extremely high data rates.
A Single Chip VLSI Implementation of a QPSK/SQPSK Demodulator for a VSAT Receiver Station

NASA Technical Reports Server (NTRS)

Kwatra, S. C.; King, Brent

1995-01-01

This thesis presents a VLSI implementation of a QPSK/SQPSK demodulator. It is designed to be employed in a VSAT earth station that utilizes the FDMA/TDM link. A single chip architecture is used to enable this chip to be easily employed in the VSAT system. This demodulator contains lowpass filters, integrate and dump units, unique word detectors, a timing recovery unit, a phase recovery unit and a down conversion unit. The design stages start with a functional representation of the system by using the C programming language. Then it progresses into a register based representation using the VHDL language. The layout components are designed based on these VHDL models and simulated. Component generators are developed for the adder, multiplier, read-only memory and serial access memory in order to shorten the design time. These sub-components are then block routed to form the main components of the system. The main components are block routed to form the final demodulator.

Self-checking self-repairing computer nodes using the mirror processor

NASA Technical Reports Server (NTRS)

Tamir, Yuval

1992-01-01

Circuitry added to fault-tolerant systems for concurrent error deduction usually reduces performance. Using a technique called micro rollback, it is possible to eliminate most of the performance penalty of concurrent error detection. Error detection is performed in parallel with intermodule communication, and erroneous state changes are later undone. The author reports on the design and implementation of a VLSI RISC microprocessor, called the Mirror Processor (MP), which is capable of micro rollback. In order to achieve concurrent error detection, two MP chips operate in lockstep, comparing external signals and a signature of internal signals every clock cycle. If a mismatch is detected, both processors roll back to the beginning of the cycle when the error occurred. In some cases the erroneous state is corrected by copying a value from the fault-free processor to the faulty processor. The architecture, microarchitecture, and VLSI implementation of the MP, emphasizing its error-detection, error-recovery, and self-diagnosis capabilities, are described.
High-performance image processing on the desktop

NASA Astrophysics Data System (ADS)

Jordan, Stephen D.

1996-04-01

The suitability of computers to the task of medical image visualization for the purposes of primary diagnosis and treatment planning depends on three factors: speed, image quality, and price. To be widely accepted the technology must increase the efficiency of the diagnostic and planning processes. This requires processing and displaying medical images of various modalities in real-time, with accuracy and clarity, on an affordable system. Our approach to meeting this challenge began with market research to understand customer image processing needs. These needs were translated into system-level requirements, which in turn were used to determine which image processing functions should be implemented in hardware. The result is a computer architecture for 2D image processing that is both high-speed and cost-effective. The architectural solution is based on the high-performance PA-RISC workstation with an HCRX graphics accelerator. The image processing enhancements are incorporated into the image visualization accelerator (IVX) which attaches to the HCRX graphics subsystem. The IVX includes a custom VLSI chip which has a programmable convolver, a window/level mapper, and an interpolator supporting nearest-neighbor, bi-linear, and bi-cubic modes. This combination of features can be used to enable simultaneous convolution, pan, zoom, rotate, and window/level control into 1 k by 1 k by 16-bit medical images at 40 frames/second.
Feasibility study, software design, layout and simulation of a two-dimensional Fast Fourier Transform machine for use in optical array interferometry

NASA Technical Reports Server (NTRS)

Boriakoff, Valentin

1994-01-01

The goal of this project was the feasibility study of a particular architecture of a digital signal processing machine operating in real time which could do in a pipeline fashion the computation of the fast Fourier transform (FFT) of a time-domain sampled complex digital data stream. The particular architecture makes use of simple identical processors (called inner product processors) in a linear organization called a systolic array. Through computer simulation the new architecture to compute the FFT with systolic arrays was proved to be viable, and computed the FFT correctly and with the predicted particulars of operation. Integrated circuits to compute the operations expected of the vital node of the systolic architecture were proven feasible, and even with a 2 micron VLSI technology can execute the required operations in the required time. Actual construction of the integrated circuits was successful in one variant (fixed point) and unsuccessful in the other (floating point).
A comparison of VLSI architecture of finite field multipliers using dual, normal or standard basis

NASA Technical Reports Server (NTRS)

Hsu, I. S.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Reed, I. S.

1987-01-01

Three different finite field multipliers are presented: (1) a dual basis multiplier due to Berlekamp; (2) a Massy-Omura normal basis multiplier; and (3) the Scott-Tavares-Peppard standard basis multiplier. These algorithms are chosen because each has its own distinct features which apply most suitably in different areas. Finally, they are implemented on silicon chips with nitride metal oxide semiconductor technology so that the multiplier most desirable for very large scale integration implementations can readily be ascertained.
Critical Problems in Very Large Scale Computer Systems

DTIC Science & Technology

1988-09-30

253-6043 Srinivas Devadas (617) 253-0454 Thomas F. Knight, Jr. (617) 253-7807 F. Thomson Leighton (617) 253-3662 Charles E. Leiserson (617) 253-5833...J. Keen, P. Nuth, J. Larivee, and B . Totty, "Message-Driven Processor Architecture," MIT VLSI Memo No. 88-468, August 1988. *W. J. Dally and A. A...losses and gains) which are the first polynomial-time combinatorial algorithms for this problem. One algorithm runs in O(n2m2 lg 2 n Ig B ) time and the
Computer Algorithms and Architectures for Three-Dimensional Eddy-Current Nondestructive Evaluation. Volume 3. Chapters 6-11

DTIC Science & Technology

1989-01-20

addressable memory can be loaded or off- loaded as the number crunching continues. Modem VLSI processors can often process data faster than today’s...Available DSP Chips Texas Instruments was one of the first serious manufacturers of DSP chips. With the Texas Instruments TMS310 DSP chip, modem , voice...Can handle double presicion data types. Texas Instruments TMS32010 T’s first-generation DSP design: a fixed-point DSP that has found its way into modem
Spacecraft on-board SAR image generation for EOS-type missions

NASA Technical Reports Server (NTRS)

Liu, K. Y.; Arens, W. E.; Assal, H. M.; Vesecky, J. F.

1987-01-01

Spacecraft on-board synthetic aperture radar (SAR) image generation is an extremely difficult problem because of the requirements for high computational rates (usually on the order of Giga-operations per second), high reliability (some missions last up to 10 years), and low power dissipation and mass (typically less than 500 watts and 100 Kilograms). Recently, a JPL study was performed to assess the feasibility of on-board SAR image generation for EOS-type missions. This paper summarizes the results of that study. Specifically, it proposes a processor architecture using a VLSI time-domain parallel array for azimuth correlation. Using available space qualifiable technology to implement the proposed architecture, an on-board SAR processor having acceptable power and mass characteristics appears feasible for EOS-type applications.
A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.

PubMed

Jiang, Guangli; Liu, Leibo; Zhu, Wenping; Yin, Shouyi; Wei, Shaojun

2015-09-04

This paper proposes a real-time feature extraction VLSI architecture for high-resolution images based on the accelerated KAZE algorithm. Firstly, a new system architecture is proposed. It increases the system throughput, provides flexibility in image resolution, and offers trade-offs between speed and scaling robustness. The architecture consists of a two-dimensional pipeline array that fully utilizes computational similarities in octaves. Secondly, a substructure (block-serial discrete-time cellular neural network) that can realize a nonlinear filter is proposed. This structure decreases the memory demand through the removal of data dependency. Thirdly, a hardware-friendly descriptor is introduced in order to overcome the hardware design bottleneck through the polar sample pattern; a simplified method to realize rotation invariance is also presented. Finally, the proposed architecture is designed in TSMC 65 nm CMOS technology. The experimental results show a performance of 127 fps in full HD resolution at 200 MHz frequency. The peak performance reaches 181 GOPS and the throughput is double the speed of other state-of-the-art architectures.
Stacked silicide/silicon mid- to long-wavelength infrared detector

NASA Technical Reports Server (NTRS)

Maserjian, Joseph (Inventor)

1990-01-01

The use of stacked Schottky barriers (16) with epitaxially grown thin silicides (10) combined with selective doping (22) of the barriers provides high quantum efficiency infrared detectors (30) at longer wavelengths that is compatible with existing silicon VLSI technology.
Stacked silicide/silicon mid- to long-wavelength infrared detector

DOEpatents

Maserjian, Joseph

1990-03-13

The use of stacked Schottky barriers (16) with epitaxially grown thin silicides (10) combined with selective doping (22) of the barriers provides high quantum efficiency infrared detectors (30) at longer wavelengths that is compatible with existing silicon VLSI technology.
A neuromorphic VLSI device for implementing 2-D selective attention systems.

PubMed

Indiveri, G

2001-01-01

Selective attention is a mechanism used to sequentially select and process salient subregions of the input space, while suppressing inputs arriving from nonsalient regions. By processing small amounts of sensory information in a serial fashion, rather than attempting to process all the sensory data in parallel, this mechanism overcomes the problem of flooding limited processing capacity systems with sensory inputs. It is found in many biological systems and can be a useful engineering tool for developing artificial systems that need to process in real-time sensory data. In this paper we present a neuromorphic hardware model of a selective attention mechanism implemented on a very large scale integration (VLSI) chip, using analog circuits. The chip makes use of a spike-based representation for receiving input signals, transmitting output signals and for shifting the selection of the attended input stimulus over time. It can be interfaced to neuromorphic sensors and actuators, for implementing multichip selective attention systems. We describe the characteristics of the circuits used in the architecture and present experimental data measured from the system.
Periodic Application of Concurrent Error Detection in Processor Array Architectures. PhD. Thesis -

NASA Technical Reports Server (NTRS)

Chen, Paul Peichuan

1993-01-01

Processor arrays can provide an attractive architecture for some applications. Featuring modularity, regular interconnection and high parallelism, such arrays are well-suited for VLSI/WSI implementations, and applications with high computational requirements, such as real-time signal processing. Preserving the integrity of results can be of paramount importance for certain applications. In these cases, fault tolerance should be used to ensure reliable delivery of a system's service. One aspect of fault tolerance is the detection of errors caused by faults. Concurrent error detection (CED) techniques offer the advantage that transient and intermittent faults may be detected with greater probability than with off-line diagnostic tests. Applying time-redundant CED techniques can reduce hardware redundancy costs. However, most time-redundant CED techniques degrade a system's performance.
Space micro-guidance and control - Applications and architectures

NASA Technical Reports Server (NTRS)

Mettler, Edward; Hadaegh, Fred Y.

1992-01-01

The features and the components of a new microscale guidance, navigation, and control (GN&C) system for future space systems are discussed. An approach is described for the utilization of new microengineering technologies for achieving major reductions in the GN&C system's mass, size, power, and costs. The micro-GN&C system and the component concepts include microactuated adaptive optics, micromachined inertial sensors, fiberoptic data nets with light-power transmission, and VLSI microcomputers. The GN&C system will be applied in microspacecraft, microlanders, microrovers, remote sensing platforms, interferometers, and deployable reflectors.
Space micro-guidance and control - Applications and architectures

NASA Astrophysics Data System (ADS)

Mettler, Edward; Hadaegh, Fred Y.

1992-07-01

The features and the components of a new microscale guidance, navigation, and control (GN&C) system for future space systems are discussed. An approach is described for the utilization of new microengineering technologies for achieving major reductions in the GN&C system's mass, size, power, and costs. The micro-GN&C system and the component concepts include microactuated adaptive optics, micromachined inertial sensors, fiberoptic data nets with light-power transmission, and VLSI microcomputers. The GN&C system will be applied in microspacecraft, microlanders, microrovers, remote sensing platforms, interferometers, and deployable reflectors.
Electronic neural networks for global optimization

NASA Technical Reports Server (NTRS)

Thakoor, A. P.; Moopenn, A. W.; Eberhardt, S.

1990-01-01

An electronic neural network with feedback architecture, implemented in analog custom VLSI is described. Its application to problems of global optimization for dynamic assignment is discussed. The convergence properties of the neural network hardware are compared with computer simulation results. The neural network's ability to provide optimal or near optimal solutions within only a few neuron time constants, a speed enhancement of several orders of magnitude over conventional search methods, is demonstrated. The effect of noise on the circuit dynamics and the convergence behavior of the neural network hardware is also examined.
A comparison between coherent and noncoherent mobile systems in large Doppler shift, delay spread, and C/I environment

NASA Technical Reports Server (NTRS)

Feher, Kamilo

1993-01-01

The performance and implementation complexity of coherent and of noncoherent QPSK and GMSK modulation/demodulation techniques in a complex mobile satellite systems environment, including large Doppler shift, delay spread, and low C/I, are compared. We demonstrate that for large f(sub d)T(sub b) products, where f(sub d) is the Doppler shift and T(sub b) is the bit duration, noncoherent (discriminator detector or differential demodulation) systems have a lower BER floor than their coherent counterparts. For significant delay spreads, e.g., tau(sub rms) greater than 0.4 T(sub b), and low C/I, coherent systems outperform noncoherent systems. However, the synchronization time of coherent systems is longer than that of noncoherent systems. Spectral efficiency, overall capacity, and related hardware complexity issues of these systems are also analyzed. We demonstrate that coherent systems have a simpler overall architecture (IF filter implementation-cost versus carrier recovery) and are more robust in an RF frequency drift environment. Additionally, the prediction tools, computer simulations, and analysis of coherent systems is simpler. The threshold or capture effect in low C/I interference environment is critical for noncoherent discriminator based systems. We conclude with a comparison of hardware architectures of coherent and of noncoherent systems, including recent trends in commercial VLSI technology and direct baseband to RF transmit, RF to baseband (0-IF) receiver implementation strategies.
A comparison between coherent and noncoherent mobile systems in large Doppler shift, delay spread, and C/I environment

NASA Astrophysics Data System (ADS)

Feher, Kamilo

The performance and implementation complexity of coherent and of noncoherent QPSK and GMSK modulation/demodulation techniques in a complex mobile satellite systems environment, including large Doppler shift, delay spread, and low C/I, are compared. We demonstrate that for large f(sub d)T(sub b) products, where f(sub d) is the Doppler shift and T(sub b) is the bit duration, noncoherent (discriminator detector or differential demodulation) systems have a lower BER floor than their coherent counterparts. For significant delay spreads, e.g., tau(sub rms) greater than 0.4 T(sub b), and low C/I, coherent systems outperform noncoherent systems. However, the synchronization time of coherent systems is longer than that of noncoherent systems. Spectral efficiency, overall capacity, and related hardware complexity issues of these systems are also analyzed. We demonstrate that coherent systems have a simpler overall architecture (IF filter implementation-cost versus carrier recovery) and are more robust in an RF frequency drift environment. Additionally, the prediction tools, computer simulations, and analysis of coherent systems is simpler. The threshold or capture effect in low C/I interference environment is critical for noncoherent discriminator based systems. We conclude with a comparison of hardware architectures of coherent and of noncoherent systems, including recent trends in commercial VLSI technology and direct baseband to RF transmit, RF to baseband (0-IF) receiver implementation strategies.
Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

NASA Technical Reports Server (NTRS)

Fischer, James R.; Grosch, Chester; Mcanulty, Michael; Odonnell, John; Storey, Owen

1987-01-01

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era.
Motion camera based on a custom vision sensor and an FPGA architecture

NASA Astrophysics Data System (ADS)

Arias-Estrada, Miguel

1998-09-01

A digital camera for custom focal plane arrays was developed. The camera allows the test and development of analog or mixed-mode arrays for focal plane processing. The camera is used with a custom sensor for motion detection to implement a motion computation system. The custom focal plane sensor detects moving edges at the pixel level using analog VLSI techniques. The sensor communicates motion events using the event-address protocol associated to a temporal reference. In a second stage, a coprocessing architecture based on a field programmable gate array (FPGA) computes the time-of-travel between adjacent pixels. The FPGA allows rapid prototyping and flexible architecture development. Furthermore, the FPGA interfaces the sensor to a compact PC computer which is used for high level control and data communication to the local network. The camera could be used in applications such as self-guided vehicles, mobile robotics and smart surveillance systems. The programmability of the FPGA allows the exploration of further signal processing like spatial edge detection or image segmentation tasks. The article details the motion algorithm, the sensor architecture, the use of the event- address protocol for velocity vector computation and the FPGA architecture used in the motion camera system.
A Coherent VLSI Environment

DTIC Science & Technology

1987-03-31

processors . The symmetry-breaking algorithms give efficient ways to convert probabilistic algorithms to deterministic algorithms. Some of the...techniques have been applied to construct several efficient linear- processor algorithms for graph problems, including an O(lg* n)-time algorithm for (A + 1...On n-node graphs, the algorithm works in O(log 2 n) time using only n processors , in contrast to the previous best algorithm which used about n3

Performance analysis of an all-digital BPSK direct sequence spread-spectrum IF receiver architecture

NASA Astrophysics Data System (ADS)

Chung, Bong-Young; Chien, Charles; Samueli, Henry; Jain, Rajeev

1993-09-01

A VLSI architecture for an all-digital binary phase shift keyed (BPSK) direct-sequence (DS) spread spectrum (SS) IF receiver is presented, and an in-depth performance analysis is given. The all-digital architecture incorporates a Costar loop for carrier recovery and a delay-locked loop for clock recovery. For the PN acquisition block, a robust energy detection scheme is proposed to reduce false PN locks over a broad range of signal-to-noise ratios. The proposed architecture is intended for use in the 902-928 MHz unlicensed spread spectrum radio band. A 100 kbs information rate and a 12.7 Mchips/second PN code rate are assumed. The IF center frequency is 12.7 MHz and the IF sampling rate is 50.8 Msamples/ second, which is the Nyquist rate for the 25.4 MHz bandwidth signal. Finite wordlength effects have been simulated to optimize the architecture, thereby minimizing the chip area, and results of the finite wordlength simulations demonstrate that the chip architecture achieves a bit error rate performance within 1 dB of theory in an additive white Gaussian noise channel. The probability of PN acquisition within 5 ms is approximately 56% at -17 dB IF input SNR and 82% at -11 dB IF input SNR.
Module generation for self-testing integrated systems

NASA Astrophysics Data System (ADS)

Vanriessen, Ronald Pieter

Hardware used for self test in VLSI (Very Large Scale Integrated) systems is reviewed, and an architecture to control the test hardware in an integrated system is presented. Because of the increase of test times, the use of self test techniques has become practically and economically viable for VLSI systems. Beside the reduction in test times and costs, self test also provides testing at operational speeds. Therefore, a suitable combination of scan path and macrospecific (self) tests is required to reduce test times and costs. An expert system that can be used in a silicon compilation environment is presented. The approach requires a minimum of testability knowledge from a system designer. A user friendly interface was described for specifying and modifying testability requirements by a testability expert. A reason directed backtracking mechanism is used to solve selection failures. Both the hierarchical testable architecture and the design for testability expert system are used in a self test compiler. The definition of a self test compiler was given. A self test compiler is a software tool that selects an appropriate test method for every macro in a design. The hardware to control a macro test will be included in the design automatically. As an example, the integration of the self-test compiler in a silicon compilation system PIRAMID was described. The design of a demonstrator circuit by self test compiler is described. This circuit consists of two self testable macros. Control of the self test hardware is carried out via the test access port of the boundary scan standard.
Ultra high speed image processing techniques. [electronic packaging techniques

NASA Technical Reports Server (NTRS)

Anthony, T.; Hoeschele, D. F.; Connery, R.; Ehland, J.; Billings, J.

1981-01-01

Packaging techniques for ultra high speed image processing were developed. These techniques involve the development of a signal feedthrough technique through LSI/VLSI sapphire substrates. This allows the stacking of LSI/VLSI circuit substrates in a 3 dimensional package with greatly reduced length of interconnecting lines between the LSI/VLSI circuits. The reduced parasitic capacitances results in higher LSI/VLSI computational speeds at significantly reduced power consumption levels.
VLSI design of lossless frame recompression using multi-orientation prediction

NASA Astrophysics Data System (ADS)

Lee, Yu-Hsuan; You, Yi-Lun; Chen, Yi-Guo

2016-01-01

Pursuing an experience of high-end visual quality drives human to demand a higher display resolution and a higher frame rate. Hence, a lot of powerful coding tools are aggregated together in emerging video coding standards to improve coding efficiency. This also makes video coding standards suffer from two design challenges: heavy computation and tremendous memory bandwidth. The first issue can be properly solved by a careful hardware architecture design with advanced semiconductor processes. Nevertheless, the second one becomes a critical design bottleneck for a modern video coding system. In this article, a lossless frame recompression using multi-orientation prediction technique is proposed to overcome this bottleneck. This work is realised into a silicon chip with the technology of TSMC 0.18 µm CMOS process. Its encoding capability can reach full-HD (1920 × 1080)@48 fps. The chip power consumption is 17.31 mW@100 MHz. Core area and chip area are 0.83 × 0.83 mm2 and 1.20 × 1.20 mm2, respectively. Experiment results demonstrate that this work exhibits an outstanding performance on lossless compression ratio with a competitive hardware performance.
Distributed asynchronous microprocessor architectures in fault tolerant integrated flight systems

NASA Technical Reports Server (NTRS)

Dunn, W. R.

1983-01-01

The paper discusses the implementation of fault tolerant digital flight control and navigation systems for rotorcraft application. It is shown that in implementing fault tolerance at the systems level using advanced LSI/VLSI technology, aircraft physical layout and flight systems requirements tend to define a system architecture of distributed, asynchronous microprocessors in which fault tolerance can be achieved locally through hardware redundancy and/or globally through application of analytical redundancy. The effects of asynchronism on the execution of dynamic flight software is discussed. It is shown that if the asynchronous microprocessors have knowledge of time, these errors can be significantly reduced through appropiate modifications of the flight software. Finally, the papear extends previous work to show that through the combined use of time referencing and stable flight algorithms, individual microprocessors can be configured to autonomously tolerate intermittent faults.
Micro guidance and control synthesis: New components, architectures, and capabilities

NASA Technical Reports Server (NTRS)

Mettler, Edward; Hadaegh, Fred Y.

1993-01-01

New GN&C (guidance, navigation and control) system capabilities are shown to arise from component innovations that involve the synergistic use of microminiature sensors and actuators, microelectronics, and fiber optics. Micro-GN&C system and component concepts are defined that include micro-actuated adaptive optics, micromachined inertial sensors, fiber-optic data nets and light-power transmission, and VLSI microcomputers. The thesis is advanced that these micro-miniaturization products are capable of having a revolutionary impact on space missions and systems, and that GN&C is the pathfinder micro-technology application that can bring that about.
Study of a programmable high speed processor for use on-board satellites

NASA Astrophysics Data System (ADS)

Degavre, J. Cl.; Okkes, R.; Gaillat, G.

The availability of VLSI programmable devices will significantly enhance satellite on-board data processing capabilities. A case study is presented which indicates that computation-intensive processing applications requiring the execution of 100 megainstructions/sec are within the CD power constraints of satellites. It is noted that the current progress in semicustom design technique development and in achievable gate array densities, together with the recent announcement of improved monochip processors, are encouraging the development of an on-board programmable processor architecture able to associate the devices that will appear in communication and military markets.
Design and implementation of a modulator-based free-space optical backplane for multiprocessor applications.

PubMed

Kirk, Andrew G; Plant, David V; Szymanski, Ted H; Vranesic, Zvonko G; Tooley, Frank A P; Rolston, David R; Ayliffe, Michael H; Lacroix, Frederic K; Robertson, Brian; Bernier, Eric; Brosseau, Daniel F

2003-05-10

Design and implementation of a free-space optical backplane for multiprocessor applications is presented. The system is designed to interconnect four multiprocessor nodes that communicate by using multiplexed 32-bit packets. Each multiprocessor node is electrically connected to an optoelectronic VLSI chip which implements the hyperplane interconnection architecture. The chips each contain 256 optical transmitters (implemented as dual-rail multiple quantum-well modulators) and 256 optical receivers. A rigid free-space microoptical interconnection system that interconnects the transceiver chips in a 512-channel unidirectional ring is implemented. Full design, implementation, and operational details are provided.
Modular Matrix Multiplication on a Linear Array.

DTIC Science & Technology

1983-11-01

is fl(n2). 2 Case e Irl __ (see Figure 5.2) 2 2 ,1 Y, " X2v- ’ Y2 -. x= -- ~ Y4 "i; Yin Figure 5Ŗ At t--xi, either all Gk, such that IkEA , have n...nat and Image Proceuing, IEEE Transactions on Computers, Vol. C-31, No. 10 22 (October, 1982), pp. IO0oo09. [41 H.T. Kung, Let’s Design Algorithms for...VLSI Systems, Proc. Caltech Conf. on Very Large Scale Integration: Architecture, Design , Fabrication (January, 1979), pp. 65. 90. 151 H.T. Kung, and
Design and implementation of a modulator-based free-space optical backplane for multiprocessor applications

NASA Astrophysics Data System (ADS)

Kirk, Andrew G.; Plant, David V.; Szymanski, Ted H.; Vranesic, Zvonko G.; Tooley, Frank A. P.; Rolston, David R.; Ayliffe, Michael H.; Lacroix, Frederic K.; Robertson, Brian; Bernier, Eric; Brosseau, Daniel F.

2003-05-01

Design and implementation of a free-space optical backplane for multiprocessor applications is presented. The system is designed to interconnect four multiprocessor nodes that communicate by using multiplexed 32-bit packets. Each multiprocessor node is electrically connected to an optoelectronic VLSI chip which implements the hyperplane interconnection architecture. The chips each contain 256 optical transmitters (implemented as dual-rail multiple quantum-well modulators) and 256 optical receivers. A rigid free-space microoptical interconnection system that interconnects the transceiver chips in a 512-channel unidirectional ring is implemented. Full design, implementation, and operational details are provided.
Adaptive WTA with an analog VLSI neuromorphic learning chip.

PubMed

Häfliger, Philipp

2007-03-01

In this paper, we demonstrate how a particular spike-based learning rule (where exact temporal relations between input and output spikes of a spiking model neuron determine the changes of the synaptic weights) can be tuned to express rate-based classical Hebbian learning behavior (where the average input and output spike rates are sufficient to describe the synaptic changes). This shift in behavior is controlled by the input statistic and by a single time constant. The learning rule has been implemented in a neuromorphic very large scale integration (VLSI) chip as part of a neurally inspired spike signal image processing system. The latter is the result of the European Union research project Convolution AER Vision Architecture for Real-Time (CAVIAR). Since it is implemented as a spike-based learning rule (which is most convenient in the overall spike-based system), even if it is tuned to show rate behavior, no explicit long-term average signals are computed on the chip. We show the rule's rate-based Hebbian learning ability in a classification task in both simulation and chip experiment, first with artificial stimuli and then with sensor input from the CAVIAR system.
Smart-Pixel Array Processors Based on Optimal Cellular Neural Networks for Space Sensor Applications

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Sheu, Bing J.; Venus, Holger; Sandau, Rainer

1997-01-01

A smart-pixel cellular neural network (CNN) with hardware annealing capability, digitally programmable synaptic weights, and multisensor parallel interface has been under development for advanced space sensor applications. The smart-pixel CNN architecture is a programmable multi-dimensional array of optoelectronic neurons which are locally connected with their local neurons and associated active-pixel sensors. Integration of the neuroprocessor in each processor node of a scalable multiprocessor system offers orders-of-magnitude computing performance enhancements for on-board real-time intelligent multisensor processing and control tasks of advanced small satellites. The smart-pixel CNN operation theory, architecture, design and implementation, and system applications are investigated in detail. The VLSI (Very Large Scale Integration) implementation feasibility was illustrated by a prototype smart-pixel 5x5 neuroprocessor array chip of active dimensions 1380 micron x 746 micron in a 2-micron CMOS technology.
Analysis of fault-tolerant neurocontrol architectures

NASA Technical Reports Server (NTRS)

Troudet, T.; Merrill, W.

1992-01-01

The fault-tolerance of analog parallel distributed implementations of a multivariable aircraft neurocontroller is analyzed by simulating weight and neuron failures in a simplified scheme of analog processing based on the functional architecture of the ETANN chip (Electrically Trainable Artificial Neural Network). The neural information processing is found to be only partially distributed throughout the set of weights of the neurocontroller synthesized with the backpropagation algorithm. Although the degree of distribution of the neural processing, and consequently the fault-tolerance of the neurocontroller, could be enhanced using Locally Distributed Weight and Neuron Approaches, a satisfactory level of fault-tolerance could only be obtained by retraining the degrated VLSI neurocontroller. The possibility of maintaining neurocontrol performance and stability in the presence of single weight of neuron failures was demonstrated through an automated retraining procedure of the neurocontroller based on a pre-programmed choice and sequence of the training parameters.
An Efficient Implementation For Real Time Applications Of The Wigner-Ville Distribution

NASA Astrophysics Data System (ADS)

Boashash, Boualem; Black, Peter; Whitehouse, Harper J.

1986-03-01

The Wigner-Ville Distribution (WVD) is a valuable tool for time-frequency signal analysis. In order to implement the WVD in real time an efficient algorithm and architecture have been developed which may be implemented with commercial components. This algorithm successively computes the analytic signal corresponding to the input signal, forms a weighted kernel function and analyses the kernel via a Discrete Fourier Transform (DFT). To evaluate the analytic signal required by the algorithm it is shown that the time domain definition implemented as a finite impulse response (FIR) filter is practical and more efficient than the frequency domain definition of the analytic signal. The windowed resolution of the WVD in the frequency domain is shown to be similar to the resolution of a windowed Fourier Transform. A real time signal processsor has been designed for evaluation of the WVD analysis system. The system is easily paralleled and can be configured to meet a variety of frequency and time resolutions. The arithmetic unit is based on a pair of high speed VLSI floating-point multiplier and adder chips. Dual operand buses and an independent result bus maximize data transfer rates. The system is horizontally microprogrammed and utilizes a full instruction pipeline. Each microinstruction specifies two operand addresses, a result location, the type of arithmetic and the memory configuration. input and output is via shared memory blocks with front-end processors to handle data transfers during the non access periods of the analyzer.
Very Large Scale Integration (VLSI).

ERIC Educational Resources Information Center

Yeaman, Andrew R. J.

Very Large Scale Integration (VLSI), the state-of-the-art production techniques for computer chips, promises such powerful, inexpensive computing that, in the future, people will be able to communicate with computer devices in natural language or even speech. However, before full-scale VLSI implementation can occur, certain salient factors must be…
VLSI (Very Large Scale Integration) Design Tools, Reference Manual, Release 3.0.

DTIC Science & Technology

1985-08-01

generators/mult prior to running mult. The generated layout is output in directory 1ca in caesar cells with names of the form "caesarame*oca. Mut is a cft ...vlsa) spice(1.vlsi), User’s Guide to AML VLSI Dodgen Tools Reference Manual, UW/NW VLSI Consortium, University of Washington, (Christopher Terman, MIT...of the form ’caesarname..ca. Muls is a cft -based program and therefore also produces *.bd fiIls ’Caesaramew may not begin with the string mule. The
Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors.

PubMed

Mahdiani, Hamid Reza; Fakhraie, Sied Mehdi; Lucas, Caro

2012-08-01

Reliability should be identified as the most important challenge in future nano-scale very large scale integration (VLSI) implementation technologies for the development of complex integrated systems. Normally, fault tolerance (FT) in a conventional system is achieved by increasing its redundancy, which also implies higher implementation costs and lower performance that sometimes makes it even infeasible. In contrast to custom approaches, a new class of applications is categorized in this paper, which is inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties. Neural networks are good indicators of imprecision-tolerant applications. We have also proposed a new class of FT techniques called relaxed fault-tolerant (RFT) techniques which are developed for VLSI implementation of imprecision-tolerant applications. The main advantage of RFT techniques with respect to traditional FT solutions is that they exploit inherent FT of different applications to reduce their implementation costs while improving their performance. To show the applicability as well as the efficiency of the RFT method, the experimental results for implementation of a face-recognition computationally intensive neural network and its corresponding RFT realization are presented in this paper. The results demonstrate promising higher performance of artificial neural network VLSI solutions for complex applications in faulty nano-scale implementation environments.
The 1992 4th NASA SERC Symposium on VLSI Design

NASA Technical Reports Server (NTRS)

Whitaker, Sterling R.

1992-01-01

Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design.
Electronic device aspects of neural network memories

NASA Technical Reports Server (NTRS)

Lambe, J.; Moopenn, A.; Thakoor, A. P.

1985-01-01

The basic issues related to the electronic implementation of the neural network model (NNM) for content addressable memories are examined. A brief introduction to the principles of the NNM is followed by an analysis of the information storage of the neural network in the form of a binary connection matrix and the recall capability of such matrix memories based on a hardware simulation study. In addition, materials and device architecture issues involved in the future realization of such networks in VLSI-compatible ultrahigh-density memories are considered. A possible space application of such devices would be in the area of large-scale information storage without mechanical devices.
A long constraint length VLSI Viterbi decoder for the DSN

NASA Technical Reports Server (NTRS)

Statman, J. I.; Zimmerman, G.; Pollara, F.; Collins, O.

1988-01-01

A Viterbi decoder, capable of decoding convolutional codes with constraint lengths up to 15, is under development for the Deep Space Network (DSN). The objective is to complete a prototype of this decoder by late 1990, and demonstrate its performance using the (15, 1/4) encoder in Galileo. The decoder is expected to provide 1 to 2 dB improvement in bit SNR, compared to the present (7, 1/2) code and existing Maximum Likelihood Convolutional Decoder (MCD). The decoder will be fully programmable for any code up to constraint length 15, and code rate 1/2 to 1/6. The decoder architecture and top-level design are described.

High performance MPEG-audio decoder IC

NASA Technical Reports Server (NTRS)

Thorn, M.; Benbassat, G.; Cyr, K.; Li, S.; Gill, M.; Kam, D.; Walker, K.; Look, P.; Eldridge, C.; Ng, P.

1993-01-01

The emerging digital audio and video compression technology brings both an opportunity and a new challenge to IC design. The pervasive application of compression technology to consumer electronics will require high volume, low cost IC's and fast time to market of the prototypes and production units. At the same time, the algorithms used in the compression technology result in complex VLSI IC's. The conflicting challenges of algorithm complexity, low cost, and fast time to market have an impact on device architecture and design methodology. The work presented in this paper is about the design of a dedicated, high precision, Motion Picture Expert Group (MPEG) audio decoder.
Low-Power Differential SRAM design for SOC Based on the 25-um Technology

NASA Astrophysics Data System (ADS)

Godugunuri, Sivaprasad; Dara, Naveen; Sambasiva Nayak, R.; Nayeemuddin, Md; Singh, Yadu, Dr.; Veda, R. N. S. Sunil

2017-08-01

In recent, the SOC styles area unit the vast complicated styles in VLSI these SOC styles having important low-power operations problems, to comprehend this we tend to enforced low-power SRAM. However these SRAM Architectures critically affects the entire power of SOC and competitive space. To beat the higher than disadvantages, during this paper, a low-power differential SRAM design is planned. The differential SRAM design stores multiple bits within the same cell, operates at minimum in operation low-tension and space per bit. The differential SRAM design designed supported the 25-um technology using Tanner-EDA Tool.
Robust Bioinformatics Recognition with VLSI Biochip Microsystem

NASA Technical Reports Server (NTRS)

Lue, Jaw-Chyng L.; Fang, Wai-Chi

2006-01-01

A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.
High data rate Reed-Solomon encoding and decoding using VLSI technology

NASA Technical Reports Server (NTRS)

Miller, Warner; Morakis, James

1987-01-01

Presented as an implementation of a Reed-Solomon encode and decoder, which is 16-symbol error correcting, each symbol is 8 bits. This Reed-Solomon (RS) code is an efficient error correcting code that the National Aeronautics and Space Administration (NASA) will use in future space communications missions. A Very Large Scale Integration (VLSI) implementation of the encoder and decoder accepts data rates up 80 Mbps. A total of seven chips are needed for the decoder (four of the seven decoding chips are customized using 3-micron Complementary Metal Oxide Semiconduction (CMOS) technology) and one chip is required for the encoder. The decoder operates with the symbol clock being the system clock for the chip set. Approximately 1.65 billion Galois Field (GF) operations per second are achieved with the decoder chip set and 640 MOPS are achieved with the encoder chip.
VLSI Technology for Cognitive Radio

NASA Astrophysics Data System (ADS)

VIJAYALAKSHMI, B.; SIDDAIAH, P.

2017-08-01

One of the most challenging tasks of cognitive radio is the efficiency in the spectrum sensing scheme to overcome the spectrum scarcity problem. The popular and widely used spectrum sensing technique is the energy detection scheme as it is very simple and doesn’t require any previous information related to the signal. We propose one such approach which is an optimised spectrum sensing scheme with reduced filter structure. The optimisation is done in terms of area and power performance of the spectrum. The simulations of the VLSI structure of the optimised flexible spectrum is done using verilog coding by using the XILINX ISE software. Our method produces performance with 13% reduction in area and 66% reduction in power consumption in comparison to the flexible spectrum sensing scheme. All the results are tabulated and comparisons are made. A new scheme for optimised and effective spectrum sensing opens up with our model.
PLA realizations for VLSI state machines

NASA Technical Reports Server (NTRS)

Gopalakrishnan, S.; Whitaker, S.; Maki, G.; Liu, K.

1990-01-01

A major problem associated with state assignment procedures for VLSI controllers is obtaining an assignment that produces minimal or near minimal logic. The key item in Programmable Logic Array (PLA) area minimization is the number of unique product terms required by the design equations. This paper presents a state assignment algorithm for minimizing the number of product terms required to implement a finite state machine using a PLA. Partition algebra with predecessor state information is used to derive a near optimal state assignment. A maximum bound on the number of product terms required can be obtained by inspecting the predecessor state information. The state assignment algorithm presented is much simpler than existing procedures and leads to the same number of product terms or less. An area-efficient PLA structure implemented in a 1.0 micron CMOS process is presented along with a summary of the performance for a controller implemented using this design procedure.
Block QCA Fault-Tolerant Logic Gates

NASA Technical Reports Server (NTRS)

Firjany, Amir; Toomarian, Nikzad; Modarres, Katayoon

2003-01-01

Suitably patterned arrays (blocks) of quantum-dot cellular automata (QCA) have been proposed as fault-tolerant universal logic gates. These block QCA gates could be used to realize the potential of QCA for further miniaturization, reduction of power consumption, increase in switching speed, and increased degree of integration of very-large-scale integrated (VLSI) electronic circuits. The limitations of conventional VLSI circuitry, the basic principle of operation of QCA, and the potential advantages of QCA-based VLSI circuitry were described in several NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35; and Hybrid VLSI/QCA Architecture for Computing FFTs (NPO-20923), which follows this article. To recapitulate the principle of operation (greatly oversimplified because of the limitation on space available for this article): A quantum-dot cellular automata contains four quantum dots positioned at or between the corners of a square cell. The cell contains two extra mobile electrons that can tunnel (in the quantummechanical sense) between neighboring dots within the cell. The Coulomb repulsion between the two electrons tends to make them occupy antipodal dots in the cell. For an isolated cell, there are two energetically equivalent arrangements (denoted polarization states) of the extra electrons. The cell polarization is used to encode binary information. Because the polarization of a nonisolated cell depends on Coulomb-repulsion interactions with neighboring cells, universal logic gates and binary wires could be constructed, in principle, by arraying QCA of suitable design in suitable patterns. Heretofore, researchers have recognized two major obstacles to realization of QCA-based logic gates: One is the need for (and the difficulty of attaining) operation of QCA circuitry at room temperature or, for that matter, at any temperature above a few Kelvins. It has been theorized that room-temperature operation could be made possible by constructing QCA as molecular-scale devices. However, in approaching the lower limit of miniaturization at the molecular level, it becomes increasingly imperative to overcome the second major obstacle, which is the need for (and the difficulty of attaining) high precision in the alignments of adjacent QCA in order to ensure the correct interactions among the quantum dots.
VLSI research

NASA Astrophysics Data System (ADS)

Brodersen, R. W.

1984-04-01

A scaled version of the RISC II chip has been fabricated and tested and these new chips have a cycle time that would outperform a VAX 11/780 by about a factor of two on compiled integer C programs. The architectural work on a RISC chip designed for a Smalltalk implementation has been completed. This chip, called SOAR (Smalltalk On a RISC), should run program s4-15 times faster than the Xerox 1100 (Dolphin), a TTL minicomputer, and about as fast as the Xerox 1132 (Dorado), a $100,000 ECL minicomputer. The 1983 VLSI tools tape has been converted for use under the latest UNIX release (4.2). The Magic (formerly called Caddy) layout system will be a unified set of highly automated tools that cover all aspects of the layout process, including stretching, compaction, tiling and routing. A multiple window package and design rule checker for this system have just been completed and compaction and stretching are partially implemented. New slope-based timing models for the Crystal timing analyzer are now fully implemented and in regular use. In an accuracy test using a dozen critical paths from the RISC II processor and cache chips it was found that Crystal's estimates were within 5-10% of SPICE's estimates, while being a factor of 10,000 times faster.
Digital MOS integrated circuits

NASA Astrophysics Data System (ADS)

Elmasry, M. I.

MOS in digital circuit design is considered along with aspects of digital VLSI, taking into account a comparison of MOSFET logic circuits, 1-micrometer MOSFET VLSI technology, a generalized guide for MOSFET miniaturization, processing technologies, novel circuit structures for VLSI, and questions of circuit and system design for VLSI. MOS memory cells and circuits are discussed, giving attention to a survey of high-density dynamic RAM cell concepts, one-device cells for dynamic random-access memories, variable resistance polysilicon for high density CMOS Ram, high performance MOS EPROMs using a stacked-gate cell, and the optimization of the latching pulse for dynamic flip-flop sensors. Programmable logic arrays are considered along with digital signal processors, microprocessors, static RAMs, and dynamic RAMs.
Advanced techniques and technology for efficient data storage, access, and transfer

NASA Technical Reports Server (NTRS)

Rice, Robert F.; Miller, Warner

1991-01-01

Advanced techniques for efficiently representing most forms of data are being implemented in practical hardware and software form through the joint efforts of three NASA centers. These techniques adapt to local statistical variations to continually provide near optimum code efficiency when representing data without error. Demonstrated in several earlier space applications, these techniques are the basis of initial NASA data compression standards specifications. Since the techniques clearly apply to most NASA science data, NASA invested in the development of both hardware and software implementations for general use. This investment includes high-speed single-chip very large scale integration (VLSI) coding and decoding modules as well as machine-transferrable software routines. The hardware chips were tested in the laboratory at data rates as high as 700 Mbits/s. A coding module's definition includes a predictive preprocessing stage and a powerful adaptive coding stage. The function of the preprocessor is to optimally process incoming data into a standard form data source that the second stage can handle.The built-in preprocessor of the VLSI coder chips is ideal for high-speed sampled data applications such as imaging and high-quality audio, but additionally, the second stage adaptive coder can be used separately with any source that can be externally preprocessed into the 'standard form'. This generic functionality assures that the applicability of these techniques and their recent high-speed implementations should be equally broad outside of NASA.
Architecture for distributed actuation and sensing using smart piezoelectric elements

NASA Astrophysics Data System (ADS)

Etienne-Cummings, Ralph; Pourboghrat, Farzad; Maruboyina, Hari K.; Abrate, Serge; Dhali, Shirshak K.

1998-07-01

We discuss vibration control of a cantilevered plate with multiple sensors and actuators. An architecture is chosen to minimize the number of control and sensing wires required. A custom VLSI chip, integrated with the sensor/actuator elements, controls the local behavior of the plate. All the actuators are addressed in parallel; local decode logic selects which actuator is stimulated. Downloaded binary data controls the applied voltage and modulation frequency for each actuator, and High Voltage MOSFETs are used to activate them. The sensors, which are independent adjacent piezoelectric ceramic elements, can be accessed in a random or sequential manner. An A/D card and GPIB interconnected test equipment allow a PC to read the sensors' outputs and dictate the actuation procedure. A visual programming environment is used to integrate the sensors, controller and actuators. Based on the constitutive relations for the piezoelectric material, simple models for the sensors and actuators are derived. A two level hierarchical robust controller is derived for motion control and for damping of vibrations.
Spike Neuromorphic VLSI-Based Bat Echolocation for Micro-Aerial Vehicle Guidance

DTIC Science & Technology

2007-03-31

IFinal 03/01/04 - 02/28/07 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Neuromorphic VLSI-based Bat Echolocation for Micro-aerial 5b.GRANTNUMBER Vehicle...uncovered interesting new issues in our choice for representing the intensity of signals. We have just finished testing the first chip version of an echo...timing-based algorithm (’openspace’) for sonar-guided navigation amidst multiple obstacles. 15. SUBJECT TERMS Neuromorphic VLSI, bat echolocation
NASA Space Engineering Research Center Symposium on VLSI Design

NASA Technical Reports Server (NTRS)

Maki, Gary K.

1990-01-01

The NASA Space Engineering Research Center (SERC) is proud to offer, at its second symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories and the electronics industry. These featured speakers share insights into next generation advances that will serve as a basis for future VLSI design. Questions of reliability in the space environment along with new directions in CAD and design are addressed by the featured speakers.
Image processing via VLSI: A concept paper

NASA Technical Reports Server (NTRS)

Nathan, R.

1982-01-01

Implementing specific image processing algorithms via very large scale integrated systems offers a potent solution to the problem of handling high data rates. Two algorithms stand out as being particularly critical -- geometric map transformation and filtering or correlation. These two functions form the basis for data calibration, registration and mosaicking. VLSI presents itself as an inexpensive ancillary function to be added to almost any general purpose computer and if the geometry and filter algorithms are implemented in VLSI, the processing rate bottleneck would be significantly relieved. A set of image processing functions that limit present systems to deal with future throughput needs, translates these functions to algorithms, implements via VLSI technology and interfaces the hardware to a general purpose digital computer is developed.
On-chip skin color detection using a triple-well CMOS process

NASA Astrophysics Data System (ADS)

Boussaid, Farid; Chai, Douglas; Bouzerdoum, Abdesselam

2004-03-01

In this paper, a current-mode VLSI architecture enabling on read-out skin detection without the need for any on-chip memory elements is proposed. An important feature of the proposed architecture is that it removes the need for demosaicing. Color separation is achieved using the strong wavelength dependence of the absorption coefficient in silicon. This wavelength dependence causes a very shallow absorption of blue light and enables red light to penetrate deeply in silicon. A triple-well process, allowing a P-well to be placed inside an N-well, is chosen to fabricate three vertically integrated photodiodes acting as the RGB color detector for each pixel. Pixels of an input RGB image are classified as skin or non-skin pixels using a statistical skin color model, chosen to offer an acceptable trade-off between skin detection performance and implementation complexity. A single processing unit is used to classify all pixels of the input RGB image. This results in reduced mismatch and also in an increased pixel fill-factor. Furthermore, the proposed current-mode architecture is programmable, allowing external control of all classifier parameters to compensate for mismatch and changing lighting conditions.
An architecture of entropy decoder, inverse quantiser and predictor for multi-standard video decoding

NASA Astrophysics Data System (ADS)

Liu, Leibo; Chen, Yingjie; Yin, Shouyi; Lei, Hao; He, Guanghui; Wei, Shaojun

2014-07-01

A VLSI architecture for entropy decoder, inverse quantiser and predictor is proposed in this article. This architecture is used for decoding video streams of three standards on a single chip, i.e. H.264/AVC, AVS (China National Audio Video coding Standard) and MPEG2. The proposed scheme is called MPMP (Macro-block-Parallel based Multilevel Pipeline), which is intended to improve the decoding performance to satisfy the real-time requirements while maintaining a reasonable area and power consumption. Several techniques, such as slice level pipeline, MB (Macro-Block) level pipeline, MB level parallel, etc., are adopted. Input and output buffers for the inverse quantiser and predictor are shared by the decoding engines for H.264, AVS and MPEG2, therefore effectively reducing the implementation overhead. Simulation shows that decoding process consumes 512, 435 and 438 clock cycles per MB in H.264, AVS and MPEG2, respectively. Owing to the proposed techniques, the video decoder can support H.264 HP (High Profile) 1920 × 1088@30fps (frame per second) streams, AVS JP (Jizhun Profile) 1920 × 1088@41fps streams and MPEG2 MP (Main Profile) 1920 × 1088@39fps streams when exploiting a 200 MHz working frequency.
Area-Efficient Graph Layouts (for VLSI).

DTIC Science & Technology

1980-08-13

thle short side, then no rectangle is ew r generated x’.ho se aspect r~itho i s \\orse di ai aJ. ’I lie d i % ide-I mid -cimq tier clInt ruolIn in... Sutherland and Donald Oestrcichcr, "flow big should a printed circuit board be?," ILEEE, Transactions on Computers, Vol. C-22, May 1973, pp. 537-542. 22
NASA Space Engineering Research Center for VLSI System Design

NASA Technical Reports Server (NTRS)

1993-01-01

This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems.
VLSI Implementation of Neuromorphic Learning Networks

DTIC Science & Technology

1993-03-31

AND DATES COVEREDFINAL/O1 AUG 90 TO 31 MAR 93 4. TITLE AND SUBTII1L S. FUNDING NUMBERS VLSI IMPLEMENTATION OF NEUROMORPHIC LEARNING NETWORKS (U) 6...Standard Form 298 (Rev 2-89) rtrfbc byv nN$I A Z’Si - 8 9- A* qip. COVER SHEET VLSI Implementation of Neuromorphic Learning Networks Contract Number... Neuromorphic Learning Networks Sponsored by Defense Advanced Research Projects Agency DARPA Order No. 7013 Monitored by AFOSR Under Contract No. F49620-90-C
Specification and Design Methodologies for High-Speed Fault-Tolerant Array Algorithms and Structures for VLSI.

DTIC Science & Technology

1987-06-01

evaluation and chip layout planning for VLSI digital systems. A high-level applicative (functional) language, implemented at UCLA, allows combining of...operating system. 2.1 Introduction The complexity of VLSI requires the application of CAD tools at all levels of the design process. In order to be...effective, these tools must be adaptive to the specific design. In this project we studied a design method based on the use of applicative languages

Optical printed circuit board (O-PCB) and VLSI photonic integrated circuits: visions, challenges, and progresses

NASA Astrophysics Data System (ADS)

Lee, El-Hang; Lee, S. G.; O, B. H.; Park, S. G.; Noh, H. S.; Kim, K. H.; Song, S. H.

2006-09-01

A collective overview and review is presented on the original work conducted on the theory, design, fabrication, and in-tegration of micro/nano-scale optical wires and photonic devices for applications in a newly-conceived photonic systems called "optical printed circuit board" (O-PCBs) and "VLSI photonic integrated circuits" (VLSI-PIC). These are aimed for compact, high-speed, multi-functional, intelligent, light-weight, low-energy and environmentally friendly, low-cost, and high-volume applications to complement or surpass the capabilities of electrical PCBs (E-PCBs) and/or VLSI electronic integrated circuit (VLSI-IC) systems. These consist of 2-dimensional or 3-dimensional planar arrays of micro/nano-optical wires and circuits to perform the functions of all-optical sensing, storing, transporting, processing, switching, routing and distributing optical signals on flat modular boards or substrates. The integrated optical devices include micro/nano-scale waveguides, lasers, detectors, switches, sensors, directional couplers, multi-mode interference devices, ring-resonators, photonic crystal devices, plasmonic devices, and quantum devices, made of polymer, silicon and other semiconductor materials. For VLSI photonic integration, photonic crystals and plasmonic structures have been used. Scientific and technological issues concerning the processes of miniaturization, interconnection and integration of these systems as applicable to board-to-board, chip-to-chip, and intra-chip integration, are discussed along with applications for future computers, telecommunications, and sensor-systems. Visions and challenges toward these goals are also discussed.
Electro-optic techniques for VLSI interconnect

NASA Astrophysics Data System (ADS)

Neff, J. A.

1985-03-01

A major limitation to achieving significant speed increases in very large scale integration (VLSI) lies in the metallic interconnects. They are costly not only from the charge transport standpoint but also from capacitive loading effects. The Defense Advanced Research Projects Agency, in pursuit of the fifth generation supercomputer, is investigating alternatives to the VLSI metallic interconnects, especially the use of optical techniques to transport the information either inter or intrachip. As the on chip performance of VLSI continues to improve via the scale down of the logic elements, the problems associated with transferring data off and onto the chip become more severe. The use of optical carriers to transfer the information within the computer is very appealing from several viewpoints. Besides the potential for gigabit propagation rates, the conversion from electronics to optics conveniently provides a decoupling of the various circuits from one another. Significant gains will also be realized in reducing cross talk between the metallic routings, and the interconnects need no longer be constrained to the plane of a thin film on the VLSI chip. In addition, optics can offer an increased programming flexibility for restructuring the interconnect network.
CMOS VLSI Active-Pixel Sensor for Tracking

NASA Technical Reports Server (NTRS)

Pain, Bedabrata; Sun, Chao; Yang, Guang; Heynssens, Julie

2004-01-01

An architecture for a proposed active-pixel sensor (APS) and a design to implement the architecture in a complementary metal oxide semiconductor (CMOS) very-large-scale integrated (VLSI) circuit provide for some advanced features that are expected to be especially desirable for tracking pointlike features of stars. The architecture would also make this APS suitable for robotic- vision and general pointing and tracking applications. CMOS imagers in general are well suited for pointing and tracking because they can be configured for random access to selected pixels and to provide readout from windows of interest within their fields of view. However, until now, the architectures of CMOS imagers have not supported multiwindow operation or low-noise data collection. Moreover, smearing and motion artifacts in collected images have made prior CMOS imagers unsuitable for tracking applications. The proposed CMOS imager (see figure) would include an array of 1,024 by 1,024 pixels containing high-performance photodiode-based APS circuitry. The pixel pitch would be 9 m. The operations of the pixel circuits would be sequenced and otherwise controlled by an on-chip timing and control block, which would enable the collection of image data, during a single frame period, from either the full frame (that is, all 1,024 1,024 pixels) or from within as many as 8 different arbitrarily placed windows as large as 8 by 8 pixels each. A typical prior CMOS APS operates in a row-at-a-time ( grolling-shutter h) readout mode, which gives rise to exposure skew. In contrast, the proposed APS would operate in a sample-first/readlater mode, suppressing rolling-shutter effects. In this mode, the analog readout signals from the pixels corresponding to the windows of the interest (which windows, in the star-tracking application, would presumably contain guide stars) would be sampled rapidly by routing them through a programmable diagonal switch array to an on-chip parallel analog memory array. The diagonal-switch and memory addresses would be generated by the on-chip controller. The memory array would be large enough to hold differential signals acquired from all 8 windows during a frame period. Following the rapid sampling from all the windows, the contents of the memory array would be read out sequentially by use of a capacitive transimpedance amplifier (CTIA) at a maximum data rate of 10 MHz. This data rate is compatible with an update rate of almost 10 Hz, even in full-frame operation
On testing VLSI chips for the big Viterbi decoder

NASA Technical Reports Server (NTRS)

Hsu, I. S.

1989-01-01

A general technique that can be used in testing very large scale integrated (VLSI) chips for the Big Viterbi Decoder (BVD) system is described. The test technique is divided into functional testing and fault-coverage testing. The purpose of functional testing is to verify that the design works functionally. Functional test vectors are converted from outputs of software simulations which simulate the BVD functionally. Fault-coverage testing is used to detect and, in some cases, to locate faulty components caused by bad fabrication. This type of testing is useful in screening out bad chips. Finally, design for testability, which is included in the BVD VLSI chip design, is described in considerable detail. Both the observability and controllability of a VLSI chip are greatly enhanced by including the design for the testability feature.
Design and implementation of highly parallel pipelined VLSI systems

NASA Astrophysics Data System (ADS)

Delange, Alphonsus Anthonius Jozef

A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
An optimal adder-based hardware architecture for the DCT/SA-DCT

NASA Astrophysics Data System (ADS)

Kinane, Andrew; Muresan, Valentin; O'Connor, Noel

2005-07-01

The explosive growth of the mobile multimedia industry has accentuated the need for ecient VLSI implemen- tations of the associated computationally demanding signal processing algorithms. This need becomes greater as end-users demand increasingly enhanced features and more advanced underpinning video analysis. One such feature is object-based video processing as supported by MPEG-4 core profile, which allows content-based in- teractivity. MPEG-4 has many computationally demanding underlying algorithms, an example of which is the Shape Adaptive Discrete Cosine Transform (SA-DCT). The dynamic nature of the SA-DCT processing steps pose significant VLSI implementation challenges and many of the previously proposed approaches use area and power consumptive multipliers. Most also ignore the subtleties of the packing steps and manipulation of the shape information. We propose a new multiplier-less serial datapath based solely on adders and multiplexers to improve area and power. The adder cost is minimised by employing resource re-use methods. The number of (physical) adders used has been derived using a common sub-expression elimination algorithm. Additional energy eciency is factored into the design by employing guarded evaluation and local clock gating. Our design implements the SA-DCT packing with minimal switching using ecient addressing logic with a transpose mem- ory RAM. The entire design has been synthesized using TSMC 0.09µm TCBN90LP technology yielding a gate count of 12028 for the datapath and its control logic.
Research in VLSI Systems. Heuristic Programming Project and VLSI Theory Project. A Fast Turn Around Facility for Very Large Scale Integration (VLSI)

DTIC Science & Technology

1982-11-01

to occur). When a rectangle is inserted, all currently selected items are de -selected, and the newly inserted rectangle is selected. This makes it...Items are de - * selected before the selection takes place. A selected symbol instance is displayed with a bold outline, and a selected rectangle edge...symbol instance or set of rectangle edges, everything previously selected is first de -selected. If the selected object is a reference point the old
A Coherent VLSI Design Environment

DTIC Science & Technology

1987-12-31

contract the total research volume in VLSI rose from an estimated $3,000,000 to over 3 $10,000,000, and a state-of-the-art VLSI fabrication facility costing...Research" 11:30 John Melngailic , "Submicron Structures Research at M.I.T." 11:55 Dimitri A. Antoniadis, "Status of the M.I.T. LSI Fabrication Facility ...1984. Contributions were made by Prof. Antoniadis and, to a small degree, Pro£ Glasser. Objective: • To develop techniques for fabricating integrated
Increasing Update Rates in the Building Walkthrough System with Automatic Model-Space Subdivision and Potentially Visible Set Calculations

DTIC Science & Technology

1990-07-01

34 ACM Computing Surveys. 6(1): 1- 55. [Syzmanski85] Syzmanski, T. G. and C. J. V. Wyk. (1985). " GOALIE : A Space Efficient System for VLSI Artwork...this. Essentially we initialize a stack with the root. We then pull an element of this stack and if it is a cell we run the occlusion operation on the
Hardware accelerator of convolution with exponential function for image processing applications

NASA Astrophysics Data System (ADS)

Panchenko, Ivan; Bucha, Victor

2015-12-01

In this paper we describe a Hardware Accelerator (HWA) for fast recursive approximation of separable convolution with exponential function. This filter can be used in many Image Processing (IP) applications, e.g. depth-dependent image blur, image enhancement and disparity estimation. We have adopted this filter RTL implementation to provide maximum throughput in constrains of required memory bandwidth and hardware resources to provide a power-efficient VLSI implementation.
ELIPS: Toward a Sensor Fusion Processor on a Chip

NASA Technical Reports Server (NTRS)

Daud, Taher; Stoica, Adrian; Tyson, Thomas; Li, Wei-te; Fabunmi, James

1998-01-01

The paper presents the concept and initial tests from the hardware implementation of a low-power, high-speed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) processor is developed to seamlessly combine rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor in compact low power VLSI. The first demonstration of the ELIPS concept targets interceptor functionality; other applications, mainly in robotics and autonomous systems are considered for the future. The main assumption behind ELIPS is that fuzzy, rule-based and neural forms of computation can serve as the main primitives of an "intelligent" processor. Thus, in the same way classic processors are designed to optimize the hardware implementation of a set of fundamental operations, ELIPS is developed as an efficient implementation of computational intelligence primitives, and relies on a set of fuzzy set, fuzzy inference and neural modules, built in programmable analog hardware. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Following software demonstrations on several interceptor data, three important ELIPS building blocks (a fuzzy set preprocessor, a rule-based fuzzy system and a neural network) have been fabricated in analog VLSI hardware and demonstrated microsecond-processing times.
Artificial immune system algorithm in VLSI circuit configuration

NASA Astrophysics Data System (ADS)

Mansor, Mohd. Asyraf; Sathasivam, Saratha; Kasihmuddin, Mohd Shareduwan Mohd

2017-08-01

In artificial intelligence, the artificial immune system is a robust bio-inspired heuristic method, extensively used in solving many constraint optimization problems, anomaly detection, and pattern recognition. This paper discusses the implementation and performance of artificial immune system (AIS) algorithm integrated with Hopfield neural networks for VLSI circuit configuration based on 3-Satisfiability problems. Specifically, we emphasized on the clonal selection technique in our binary artificial immune system algorithm. We restrict our logic construction to 3-Satisfiability (3-SAT) clauses in order to outfit with the transistor configuration in VLSI circuit. The core impetus of this research is to find an ideal hybrid model to assist in the VLSI circuit configuration. In this paper, we compared the artificial immune system (AIS) algorithm (HNN-3SATAIS) with the brute force algorithm incorporated with Hopfield neural network (HNN-3SATBF). Microsoft Visual C++ 2013 was used as a platform for training, simulating and validating the performances of the proposed network. The results depict that the HNN-3SATAIS outperformed HNN-3SATBF in terms of circuit accuracy and CPU time. Thus, HNN-3SATAIS can be used to detect an early error in the VLSI circuit design.
Intelligent fuzzy controller for event-driven real time systems

NASA Technical Reports Server (NTRS)

Grantner, Janos; Patyra, Marek; Stachowicz, Marian S.

1992-01-01

Most of the known linguistic models are essentially static, that is, time is not a parameter in describing the behavior of the object's model. In this paper we show a model for synchronous finite state machines based on fuzzy logic. Such finite state machines can be used to build both event-driven, time-varying, rule-based systems and the control unit section of a fuzzy logic computer. The architecture of a pipelined intelligent fuzzy controller is presented, and the linguistic model is represented by an overall fuzzy relation stored in a single rule memory. A VLSI integrated circuit implementation of the fuzzy controller is suggested. At a clock rate of 30 MHz, the controller can perform 3 MFLIPS on multi-dimensional fuzzy data.
Programmable synaptic devices for electronic neural nets

NASA Technical Reports Server (NTRS)

Moopenn, A.; Thakoor, A. P.

1990-01-01

The architecture, design, and operational characteristics of custom VLSI and thin film synaptic devices are described. The devices include CMOS-based synaptic chips containing 1024 reprogrammable synapses with a 6-bit dynamic range, and nonvolatile, write-once, binary synaptic arrays based on memory switching in hydrogenated amorphous silicon films. Their suitability for embodiment of fully parallel and analog neural hardware is discussed. Specifically, a neural network solution to an assignment problem of combinatorial global optimization, implemented in fully parallel hardware using the synaptic chips, is described. The network's ability to provide optimal and near optimal solutions over a time scale of few neuron time constants has been demonstrated and suggests a speedup improvement of several orders of magnitude over conventional search methods.
Low-power low-noise mixed-mode VLSI ASIC for infinite dynamic range imaging applications

NASA Astrophysics Data System (ADS)

Turchetta, Renato; Hu, Y.; Zinzius, Y.; Colledani, C.; Loge, A.

1998-11-01

Solid state solutions for imaging are mainly represented by CCDs and, more recently, by CMOS imagers. Both devices are based on the integration of the total charge generated by the impinging radiation, with no processing of the single photon information. The dynamic range of these devices is intrinsically limited by the finite value of noise. Here we present the design of an architecture which allows efficient, in-pixel, noise reduction to a practically zero level, thus allowing infinite dynamic range imaging. A detailed calculation of the dynamic range is worked out, showing that noise is efficiently suppressed. This architecture is based on the concept of single-photon counting. In each pixel, we integrate both the front-end, low-noise, low-power analog part and the digital part. The former consists of a charge preamplifier, an active filter for optimal noise bandwidth reduction, a buffer and a threshold comparator, and the latter is simply a counter, which can be programmed to act as a normal shift register for the readout of the counters' contents. Two different ASIC's based on this concept have been designed for different applications. The first one has been optimized for silicon edge-on microstrips detectors, used in a digital mammography R and D project. It is a 32-channel circuit, with a 16-bit binary static counter.It has been optimized for a relatively large detector capacitance of 5 pF. Noise has been measured to be equal to 100 + 7*Cd (pF) electron rms with the digital part, showing no degradation of the noise performances with respect to the design values. The power consumption is 3.8mW/channel for a peaking time of about 1 microsecond(s) . The second circuit is a prototype for pixel imaging. The total active area is about (250 micrometers )**2. The main differences of the electronic architecture with respect to the first prototype are: i) different optimization of the analog front-end part for low-capacitance detectors, ii) in- pixel 4-bit comparator-offset compensation, iii) 15-bit pseudo-random counter. The power consumption is 255 (mu) W/channel for a peaking time of 300 ns and an equivalent noise charge of 185 + 97*Cd electrons rms. Simulation and experimental result as well as imaging results will be presented.
Testing Methods for Integrated Circuit Chips.

DTIC Science & Technology

1986-03-27

DWf <I IAV ~IMi MORY OUT LOGIC~~ IPOGRAM ASYC S’E4i E...* 16o, CO% T ROL CO%TROL 32 Figure 2 . 14 VLSI Tester Block Diagram. registers, memory and test...neral-pIurpos’ processor wi th standard bus- inte-rfaco se-rves as,- th- test control Ii’r and ( 2 ) a c-ustom VLSI test Controller inti-rfacing direc(_t1...Engineering 2 WTWTY ABSTRACT Provision for the functional testing of fabricated VLSI chips frequently involves as much design effort as the orig- _ inal
Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing

PubMed Central

Farabet, Clément; Paz, Rafael; Pérez-Carrasco, Jose; Zamarreño-Ramos, Carlos; Linares-Barranco, Alejandro; LeCun, Yann; Culurciello, Eugenio; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabe

2012-01-01

Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search, and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video information in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed- and time-multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high-speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable, and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro-inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons. PMID:22518097
On recursive least-squares filtering algorithms and implementations. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Hsieh, Shih-Fu

1990-01-01

In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application.
Using multiple-accumulator CMACs to improve efficiency of the X part of an input-buffered FX correlator

NASA Astrophysics Data System (ADS)

Lapshev, Stepan; Hasan, S. M. Rezaul

2017-04-01

This paper presents the approach of using complex multiplier-accumulators (CMACs) with multiple accumulators to reduce the total number of memory operations in an input-buffered architecture for the X part of an FX correlator. A processing unit of this architecture uses an array of CMACs that are reused for different groups of baselines. The disadvantage of processing correlations in this way is that each input data sample has to be read multiple times from the memory because each input signal is used in many of these baseline groups. While a one-accumulator CMAC cannot switch to a different baseline until it is finished integrating the current one, a multiple-accumulator CMAC can. Thus, the array of multiple-accumulator CMACs can switch between processing different baselines that share some input signals at any moment to reuse the current data in the processing buffers. In this way significant reductions in the number of memory read operations are achieved with only a few accumulators per CMAC. For example, for a large number of input signals three-accumulator CMACs reduce the total number of memory operations by more than a third. Simulated energy measurements of four VLSI designs in a high-performance 28 nm CMOS technology are presented in this paper to demonstrate that using multiple accumulators can also lead to reduced power dissipation of the processing array. Using three accumulators as opposed to one has been found to reduce the overall energy of 8-bit CMACs by 1.4% through the reduction of the switching activity within their circuits, which is in addition to a more than 30% reduction in the memory.
Comparison between Frame-Constrained Fix-Pixel-Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual Processing.

PubMed

Farabet, Clément; Paz, Rafael; Pérez-Carrasco, Jose; Zamarreño-Ramos, Carlos; Linares-Barranco, Alejandro; Lecun, Yann; Culurciello, Eugenio; Serrano-Gotarredona, Teresa; Linares-Barranco, Bernabe

2012-01-01

Most scene segmentation and categorization architectures for the extraction of features in images and patches make exhaustive use of 2D convolution operations for template matching, template search, and denoising. Convolutional Neural Networks (ConvNets) are one example of such architectures that can implement general-purpose bio-inspired vision systems. In standard digital computers 2D convolutions are usually expensive in terms of resource consumption and impose severe limitations for efficient real-time applications. Nevertheless, neuro-cortex inspired solutions, like dedicated Frame-Based or Frame-Free Spiking ConvNet Convolution Processors, are advancing real-time visual processing. These two approaches share the neural inspiration, but each of them solves the problem in different ways. Frame-Based ConvNets process frame by frame video information in a very robust and fast way that requires to use and share the available hardware resources (such as: multipliers, adders). Hardware resources are fixed- and time-multiplexed by fetching data in and out. Thus memory bandwidth and size is important for good performance. On the other hand, spike-based convolution processors are a frame-free alternative that is able to perform convolution of a spike-based source of visual information with very low latency, which makes ideal for very high-speed applications. However, hardware resources need to be available all the time and cannot be time-multiplexed. Thus, hardware should be modular, reconfigurable, and expansible. Hardware implementations in both VLSI custom integrated circuits (digital and analog) and FPGA have been already used to demonstrate the performance of these systems. In this paper we present a comparison study of these two neuro-inspired solutions. A brief description of both systems is presented and also discussions about their differences, pros and cons.

A second generation 50 Mbps VLSI level zero processing system prototype

NASA Technical Reports Server (NTRS)

Harris, Jonathan C.; Shi, Jeff; Speciale, Nick; Bennett, Toby

1994-01-01

Level Zero Processing (LZP) generally refers to telemetry data processing functions performed at ground facilities to remove all communication artifacts from instrument data. These functions typically include frame synchronization, error detection and correction, packet reassembly and sorting, playback reversal, merging, time-ordering, overlap deletion, and production of annotated data sets. The Data Systems Technologies Division (DSTD) at Goddard Space Flight Center (GSFC) has been developing high-performance Very Large Scale Integration Level Zero Processing Systems (VLSI LZPS) since 1989. The first VLSI LZPS prototype demonstrated 20 Megabits per second (Mbp's) capability in 1992. With a new generation of high-density Application-specific Integrated Circuits (ASIC) and a Mass Storage System (MSS) based on the High-performance Parallel Peripheral Interface (HiPPI), a second prototype has been built that achieves full 50 Mbp's performance. This paper describes the second generation LZPS prototype based upon VLSI technologies.
A novel VLSI processor architecture for supercomputing arrays

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

1993-01-01

Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.
A simple modern correctness condition for a space-based high-performance multiprocessor

NASA Technical Reports Server (NTRS)

Probst, David K.; Li, Hon F.

1992-01-01

A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Design of Tactile Sensor Using Dynamic Wafer Technology Based on VLSI Technique

DTIC Science & Technology

2001-10-25

Charles Noback, Rober Carola," Human Anatomy and Physiology" third edition, 1995. [5] M.H. Raibert and John E. Tanner, "Design and Implementation of VLSI Tactile Sensing Computer" Robotics Research vol 1, 1983.
A special purpose silicon compiler for designing supercomputing VLSI systems

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Murugavel, P.; Kamakoti, V.; Shankarraman, M. J.; Rangarajan, S.; Mallikarjun, M.; Karthikeyan, B.; Prabhakar, T. S.; Satish, V.; Venkatasubramaniam, P. R.

1991-01-01

Design of general/special purpose supercomputing VLSI systems for numeric algorithm execution involves tackling two important aspects, namely their computational and communication complexities. Development of software tools for designing such systems itself becomes complex. Hence a novel design methodology has to be developed. For designing such complex systems a special purpose silicon compiler is needed in which: the computational and communicational structures of different numeric algorithms should be taken into account to simplify the silicon compiler design, the approach is macrocell based, and the software tools at different levels (algorithm down to the VLSI circuit layout) should get integrated. In this paper a special purpose silicon (SPS) compiler based on PACUBE macrocell VLSI arrays for designing supercomputing VLSI systems is presented. It is shown that turn-around time and silicon real estate get reduced over the silicon compilers based on PLA's, SLA's, and gate arrays. The first two silicon compiler characteristics mentioned above enable the SPS compiler to perform systolic mapping (at the macrocell level) of algorithms whose computational structures are of GIPOP (generalized inner product outer product) form. Direct systolic mapping on PLA's, SLA's, and gate arrays is very difficult as they are micro-cell based. A novel GIPOP processor is under development using this special purpose silicon compiler.
A VLSI decomposition of the deBruijn graph

NASA Technical Reports Server (NTRS)

Collins, O.; Dolinar, S.; Mceliece, R.; Pollara, F.

1990-01-01

A new Viterbi decoder for convolutional codes with constraint lengths up to 15, called the Big Viterbi Decoder, is under development for the Deep Space Network. It will be demonstrated by decoding data from the Galileo spacecraft, which has a rate 1/4, constraint-length 15 convolutional encoder on board. Here, the mathematical theory underlying the design of the very-large-scale-integrated (VLSI) chips that are being used to build this decoder is explained. The deBruijn graph B sub n describes the topology of a fully parallel, rate 1/v, constraint length n+2 Viterbi decoder, and it is shown that B sub n can be built by appropriately wiring together (i.e., connecting together with extra edges) many isomorphic copies of a fixed graph called a B sub n building block. The efficiency of such a building block is defined as the fraction of the edges in B sub n that are present in the copies of the building block. It is shown, among other things, that for any alpha less than 1, there exists a graph G which is a B sub n building block of efficiency greater than alpha for all sufficiently large n. These results are illustrated by describing a special hierarchical family of deBruijn building blocks, which has led to the design of the gate-array chips being used in the Big Viterbi Decoder.
Least reliable bits coding (LRBC) for high data rate satellite communications

NASA Technical Reports Server (NTRS)

Vanderaar, Mark; Budinger, James; Wagner, Paul

1992-01-01

LRBC, a bandwidth efficient multilevel/multistage block-coded modulation technique, is analyzed. LRBC uses simple multilevel component codes that provide increased error protection on increasingly unreliable modulated bits in order to maintain an overall high code rate that increases spectral efficiency. Soft-decision multistage decoding is used to make decisions on unprotected bits through corrections made on more protected bits. Analytical expressions and tight performance bounds are used to show that LRBC can achieve increased spectral efficiency and maintain equivalent or better power efficiency compared to that of BPSK. The relative simplicity of Galois field algebra vs the Viterbi algorithm and the availability of high-speed commercial VLSI for block codes indicates that LRBC using block codes is a desirable method for high data rate implementations.
An Analogue VLSI Implementation of the Meddis Inner Hair Cell Model

NASA Astrophysics Data System (ADS)

McEwan, Alistair; van Schaik, André

2003-12-01

The Meddis inner hair cell model is a widely accepted, but computationally intensive computer model of mammalian inner hair cell function. We have produced an analogue VLSI implementation of this model that operates in real time in the current domain by using translinear and log-domain circuits. The circuit has been fabricated on a chip and tested against the Meddis model for (a) rate level functions for onset and steady-state response, (b) recovery after masking, (c) additivity, (d) two-component adaptation, (e) phase locking, (f) recovery of spontaneous activity, and (g) computational efficiency. The advantage of this circuit, over other electronic inner hair cell models, is its nearly exact implementation of the Meddis model which can be tuned to behave similarly to the biological inner hair cell. This has important implications on our ability to simulate the auditory system in real time. Furthermore, the technique of mapping a mathematical model of first-order differential equations to a circuit of log-domain filters allows us to implement real-time neuromorphic signal processors for a host of models using the same approach.
VLSI realization of learning vector quantization with hardware/software co-design for different applications

NASA Astrophysics Data System (ADS)

An, Fengwei; Akazawa, Toshinobu; Yamasaki, Shogo; Chen, Lei; Jürgen Mattausch, Hans

2015-04-01

This paper reports a VLSI realization of learning vector quantization (LVQ) with high flexibility for different applications. It is based on a hardware/software (HW/SW) co-design concept for on-chip learning and recognition and designed as a SoC in 180 nm CMOS. The time consuming nearest Euclidean distance search in the LVQ algorithm’s competition layer is efficiently implemented as a pipeline with parallel p-word input. Since neuron number in the competition layer, weight values, input and output number are scalable, the requirements of many different applications can be satisfied without hardware changes. Classification of a d-dimensional input vector is completed in n × \\lceil d/p \\rceil + R clock cycles, where R is the pipeline depth, and n is the number of reference feature vectors (FVs). Adjustment of stored reference FVs during learning is done by the embedded 32-bit RISC CPU, because this operation is not time critical. The high flexibility is verified by the application of human detection with different numbers for the dimensionality of the FVs.
Optical Interconnections for VLSI Computational Systems Using Computer-Generated Holography.

NASA Astrophysics Data System (ADS)

Feldman, Michael Robert

Optical interconnects for VLSI computational systems using computer generated holograms are evaluated in theory and experiment. It is shown that by replacing particular electronic connections with free-space optical communication paths, connection of devices on a single chip or wafer and between chips or modules can be improved. Optical and electrical interconnects are compared in terms of power dissipation, communication bandwidth, and connection density. Conditions are determined for which optical interconnects are advantageous. Based on this analysis, it is shown that by applying computer generated holographic optical interconnects to wafer scale fine grain parallel processing systems, dramatic increases in system performance can be expected. Some new interconnection networks, designed to take full advantage of optical interconnect technology, have been developed. Experimental Computer Generated Holograms (CGH's) have been designed, fabricated and subsequently tested in prototype optical interconnected computational systems. Several new CGH encoding methods have been developed to provide efficient high performance CGH's. One CGH was used to decrease the access time of a 1 kilobit CMOS RAM chip. Another was produced to implement the inter-processor communication paths in a shared memory SIMD parallel processor array.
SSI/MSI/LSI/VLSI/ULSI.

ERIC Educational Resources Information Center

Alexander, George

1984-01-01

Discusses small-scale integrated (SSI), medium-scale integrated (MSI), large-scale integrated (LSI), very large-scale integrated (VLSI), and ultra large-scale integrated (ULSI) chips. The development and properties of these chips, uses of gallium arsenide, Josephson devices (two superconducting strips sandwiching a thin insulator), and future…
Research in the design of high-performance reconfigurable systems

NASA Technical Reports Server (NTRS)

Mcewan, S. D.; Spry, A. J.

1985-01-01

Computer aided design and computer aided manufacturing have the potential for greatly reducing the cost and lead time in the development of VLSI components. This potential paves the way for the design and fabrication of a wide variety of economically feasible high level functional units. It was observed that current computer systems have only a limited capacity to absorb new VLSI component types other than memory, microprocessors, and a relatively small number of other parts. The first purpose is to explore a system design which is capable of effectively incorporating a considerable number of VLSI part types and will both increase the speed of computation and reduce the attendant programming effort. A second purpose is to explore design techniques for VLSI parts which when incorporated by such a system will result in speeds and costs which are optimal. The proposed work may lay the groundwork for future efforts in the extensive simulation and measurements of the system's cost effectiveness and lead to prototype development.
Implementing neural nets with programmable logic

NASA Technical Reports Server (NTRS)

Vidal, Jacques J.

1988-01-01

Networks of Boolean programmable logic modules are presented as one purely digital class of artificial neural nets. The approach contrasts with the continuous analog framework usually suggested. Programmable logic networks are capable of handling many neural-net applications. They avoid some of the limitations of threshold logic networks and present distinct opportunities. The network nodes are called dynamically programmable logic modules. They can be implemented with digitally controlled demultiplexers. Each node performs a Boolean function of its inputs which can be dynamically assigned. The overall network is therefore a combinational circuit and its outputs are Boolean global functions of the network's input variables. The approach offers definite advantages for VLSI implementation, namely, a regular architecture with limited connectivity, simplicity of the control machinery, natural modularity, and the support of a mature technology.
Wafer level reliability for high-performance VLSI design

NASA Technical Reports Server (NTRS)

Root, Bryan J.; Seefeldt, James D.

1987-01-01

As very large scale integration architecture requires higher package density, reliability of these devices has approached a critical level. Previous processing techniques allowed a large window for varying reliability. However, as scaling and higher current densities push reliability to its limit, tighter control and instant feedback becomes critical. Several test structures developed to monitor reliability at the wafer level are described. For example, a test structure was developed to monitor metal integrity in seconds as opposed to weeks or months for conventional testing. Another structure monitors mobile ion contamination at critical steps in the process. Thus the reliability jeopardy can be assessed during fabrication preventing defective devices from ever being placed in the field. Most importantly, the reliability can be assessed on each wafer as opposed to an occasional sample.
Power feasibility of implantable digital spike-sorting circuits for neural prosthetic systems.

PubMed

Zumsteg, Zachary S; Ahmed, Rizwan E; Santhanam, Gopal; Shenoy, Krishna V; Meng, Teresa H

2004-01-01

A new class of neural prosthetic systems aims to assist disabled patients by translating cortical neural activity into control signals for prosthetic devices. Based on the success of proof-of-concept systems in the laboratory, there is now considerable interest in increasing system performance and creating implantable electronics for use in clinical systems. A critical question that impacts system performance and the overall architecture of these systems is whether it is possible to identify the neural source of each action potential (spike sorting) in real-time and with low power. Low power is essential both for power supply considerations and heat dissipation in the brain. In this paper we report that several state-of-the-art spike sorting algorithms implemented in modern CMOS VLSI processes are expected to be power realistic.
Techniques for computing the discrete Fourier transform using the quadratic residue Fermat number systems

NASA Technical Reports Server (NTRS)

Truong, T. K.; Chang, J. J.; Hsu, I. S.; Pei, D. Y.; Reed, I. S.

1986-01-01

The complex integer multiplier and adder over the direct sum of two copies of finite field developed by Cozzens and Finkelstein (1985) is specialized to the direct sum of the rings of integers modulo Fermat numbers. Such multiplication over the rings of integers modulo Fermat numbers can be performed by means of two integer multiplications, whereas the complex integer multiplication requires three integer multiplications. Such multiplications and additions can be used in the implementation of a discrete Fourier transform (DFT) of a sequence of complex numbers. The advantage of the present approach is that the number of multiplications needed to compute a systolic array of the DFT can be reduced substantially. The architectural designs using this approach are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.
[Radiation Tolerant Electronics

NASA Technical Reports Server (NTRS)

1996-01-01

Research work in the providing radiation tolerant electronics to NASA and the commercial sector is reported herein. There are four major sections to this report: (1) Special purpose VLSI technology section discusses the status of the VLSI projects as well as the new background technologies that have been developed; (2) Lossless data compression results provide the background and direction of new data compression pursued under this grant; (3) Commercial technology transfer presents an itemization of the commercial technology transfer; and (4) Delivery of VLSI to the Government is a solution and progress report that shows how the Government and Government contractors are gaining access to the technology that has been developed by the MRC.
Universal programmable logic gate and routing method

NASA Technical Reports Server (NTRS)

Vatan, Farrokh (Inventor); Akarvardar, Kerem (Inventor); Mojarradi, Mohammad M. (Inventor); Fijany, Amir (Inventor); Cristoloveanu, Sorin (Inventor); Kolawa, Elzbieta (Inventor); Blalock, Benjamin (Inventor); Chen, Suheng (Inventor); Toomarian, Nikzad (Inventor)

2009-01-01

An universal and programmable logic gate based on G.sup.4-FET technology is disclosed, leading to the design of more efficient logic circuits. A new full adder design based on the G.sup.4-FET is also presented. The G.sup.4-FET can also function as a unique router device offering coplanar crossing of signal paths that are isolated and perpendicular to one another. This has the potential of overcoming major limitations in VLSI design where complex interconnection schemes have become increasingly problematic.
Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results

NASA Technical Reports Server (NTRS)

Fijany, Amir; Toomarian, Benny N.

2000-01-01

There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA-based architectures for highly parallel and systolic computation of signal/image processing applications, such as FFT and Wavelet and Wlash-Hadamard Transforms.
A microarchitecture for resource-limited superscalar microprocessors

NASA Astrophysics Data System (ADS)

Basso, Todd David

1999-11-01

Microelectronic components in space and satellite systems must be resistant to total dose radiation, single-even upset, and latchup in order to accomplish their missions. The demand for inexpensive, high-volume, radiation hardened (rad-hard) integrated circuits (ICs) is expected to increase dramatically as the communication market continues to expand. Motorola's Complementary Gallium Arsenide (CGaAsTM) technology offers superior radiation tolerance compared to traditional CMOS processes, while being more economical than dedicated rad-hard CMOS processes. The goals of this dissertation are to optimize a superscalar microarchitecture suitable for CGaAsTM microprocessors, develop circuit techniques for such applications, and evaluate the potential of CGaAsTM for the development of digital VLSI circuits. Motorola's 0.5 mum CGaAsTM process is summarized and circuit techniques applicable to digital CGaAsTM are developed. Direct coupled FET, complementary, and domino logic circuits are compared based on speed, power, area, and noise margins. These circuit techniques are employed in the design of a 600 MHz PowerPCTM arithmetic logic unit. The dissertation emphasizes CGaASTM-specific design considerations, specifically, low integration level. A baseline superscalar microarchitecture is defined and SPEC95 integer benchmark simulations are used to evaluate the applicability of advanced architectural features to microprocessors having low integration levels. The performance simulations center around the optimization of a simple superscalar core, small-scale branch prediction, instruction prefetching, and an off-chip primary data cache. The simulation results are used to develop a superscalar microarchitecture capable of outperforming a comparable sequential pipeline, while using only 500,000 transistors. The architecture, running at 200 MHz, is capable of achieving an estimated 153 MIPS, translating to a 27% performance increase over a comparable traditional pipelined microprocessor. The proposed microarchitecture is process independent and can be applied to low-cost, or transistor-limited applications. The proposed microarchitecture is implemented in the design of a 0.35 mum CMOS microprocessor, and the design of a 0.5 mum CGaAsTM micro-processor. The two technologies and designs are compared to ascertain the state of CGaAsTM for digital VLSI applications.

Development Of A Three-Dimensional Circuit Integration Technology And Computer Architecture

NASA Astrophysics Data System (ADS)

Etchells, R. D.; Grinberg, J.; Nudd, G. R.

1981-12-01

This paper is the first of a series 1,2,3 describing a range of efforts at Hughes Research Laboratories, which are collectively referred to as "Three-Dimensional Microelectronics." The technology being developed is a combination of a unique circuit fabrication/packaging technology and a novel processing architecture. The packaging technology greatly reduces the parasitic impedances associated with signal-routing in complex VLSI structures, while simultaneously allowing circuit densities orders of magnitude higher than the current state-of-the-art. When combined with the 3-D processor architecture, the resulting machine exhibits a one- to two-order of magnitude simultaneous improvement over current state-of-the-art machines in the three areas of processing speed, power consumption, and physical volume. The 3-D architecture is essentially that commonly referred to as a "cellular array", with the ultimate implementation having as many as 512 x 512 processors working in parallel. The three-dimensional nature of the assembled machine arises from the fact that the chips containing the active circuitry of the processor are stacked on top of each other. In this structure, electrical signals are passed vertically through the chips via thermomigrated aluminum feedthroughs. Signals are passed between adjacent chips by micro-interconnects. This discussion presents a broad view of the total effort, as well as a more detailed treatment of the fabrication and packaging technologies themselves. The results of performance simulations of the completed 3-D processor executing a variety of algorithms are also presented. Of particular pertinence to the interests of the focal-plane array community is the simulation of the UNICORNS nonuniformity correction algorithms as executed by the 3-D architecture.
Emerging Applications for High K Materials in VLSI Technology

PubMed Central

Clark, Robert D.

2014-01-01

The current status of High K dielectrics in Very Large Scale Integrated circuit (VLSI) manufacturing for leading edge Dynamic Random Access Memory (DRAM) and Complementary Metal Oxide Semiconductor (CMOS) applications is summarized along with the deposition methods and general equipment types employed. Emerging applications for High K dielectrics in future CMOS are described as well for implementations in 10 nm and beyond nodes. Additional emerging applications for High K dielectrics include Resistive RAM memories, Metal-Insulator-Metal (MIM) diodes, Ferroelectric logic and memory devices, and as mask layers for patterning. Atomic Layer Deposition (ALD) is a common and proven deposition method for all of the applications discussed for use in future VLSI manufacturing. PMID:28788599
Parallel-Processing Equalizers for Multi-Gbps Communications

NASA Technical Reports Server (NTRS)

Gray, Andrew; Ghuman, Parminder; Hoy, Scott; Satorius, Edgar H.

2004-01-01

Architectures have been proposed for the design of frequency-domain least-mean-square complex equalizers that would be integral parts of parallel- processing digital receivers of multi-gigahertz radio signals and other quadrature-phase-shift-keying (QPSK) or 16-quadrature-amplitude-modulation (16-QAM) of data signals at rates of multiple gigabits per second. Equalizers as used here denotes receiver subsystems that compensate for distortions in the phase and frequency responses of the broad-band radio-frequency channels typically used to convey such signals. The proposed architectures are suitable for realization in very-large-scale integrated (VLSI) circuitry and, in particular, complementary metal oxide semiconductor (CMOS) application- specific integrated circuits (ASICs) operating at frequencies lower than modulation symbol rates. A digital receiver of the type to which the proposed architecture applies (see Figure 1) would include an analog-to-digital converter (A/D) operating at a rate, fs, of 4 samples per symbol period. To obtain the high speed necessary for sampling, the A/D and a 1:16 demultiplexer immediately following it would be constructed as GaAs integrated circuits. The parallel-processing circuitry downstream of the demultiplexer, including a demodulator followed by an equalizer, would operate at a rate of only fs/16 (in other words, at 1/4 of the symbol rate). The output from the equalizer would be four parallel streams of in-phase (I) and quadrature (Q) samples.
VLSI chip-set for data compression using the Rice algorithm

NASA Technical Reports Server (NTRS)

Venbrux, J.; Liu, N.

1990-01-01

A full custom VLSI implementation of a data compression encoder and decoder which implements the lossless Rice data compression algorithm is discussed in this paper. The encoder and decoder reside on single chips. The data rates are to be 5 and 10 Mega-samples-per-second for the decoder and encoder respectively.
An Interactive Multimedia Learning Environment for VLSI Built with COSMOS

ERIC Educational Resources Information Center

Angelides, Marios C.; Agius, Harry W.

2002-01-01

This paper presents Bigger Bits, an interactive multimedia learning environment that teaches students about VLSI within the context of computer electronics. The system was built with COSMOS (Content Oriented semantic Modelling Overlay Scheme), which is a modelling scheme that we developed for enabling the semantic content of multimedia to be used…
Dynamically-allocated multi-queue buffers for VLSI communication switches

NASA Technical Reports Server (NTRS)

Tamir, Yuval; Frazier, Gregory L.

1992-01-01

Several buffer structures are discussed and compared in terms of implementation complexity, interswitch handshaking requirements, and their ability to deal with variations in traffic patterns and message lengths. A new design of buffers is presented that provide non-FIFO message handling and efficient storage allocation for variable size packets using linked lists managed by a simple on-chip controller. The new buffer design is evaluated by comparing it to several alternative designs in the context of a multistage interconnection network. The present modeling and simulations show that the new buffer outperforms alternative buffers and can thus be used to improve the performance of a wide variety of systems currently using less efficient buffers.
High performance VLSI telemetry data systems

NASA Technical Reports Server (NTRS)

Chesney, J.; Speciale, N.; Horner, W.; Sabia, S.

1990-01-01

NASA's deployment of major space complexes such as Space Station Freedom (SSF) and the Earth Observing System (EOS) will demand increased functionality and performance from ground based telemetry acquisition systems well above current system capabilities. Adaptation of space telemetry data transport and processing standards such as those specified by the Consultative Committee for Space Data Systems (CCSDS) standards and those required for commercial ground distribution of telemetry data, will drive these functional and performance requirements. In addition, budget limitations will force the requirement for higher modularity, flexibility, and interchangeability at lower cost in new ground telemetry data system elements. At NASA's Goddard Space Flight Center (GSFC), the design and development of generic ground telemetry data system elements, over the last five years, has resulted in significant solutions to these problems. This solution, referred to as the functional components approach includes both hardware and software components ready for end user application. The hardware functional components consist of modern data flow architectures utilizing Application Specific Integrated Circuits (ASIC's) developed specifically to support NASA's telemetry data systems needs and designed to meet a range of data rate requirements up to 300 Mbps. Real-time operating system software components support both embedded local software intelligence, and overall system control, status, processing, and interface requirements. These components, hardware and software, form the superstructure upon which project specific elements are added to complete a telemetry ground data system installation. This paper describes the functional components approach, some specific component examples, and a project example of the evolution from VLSI component, to basic board level functional component, to integrated telemetry data system.
Circuit Design Approaches for Implementation of a Subtrellis IC for a Reed-Muller Subcode

NASA Technical Reports Server (NTRS)

Lin, Shu; Uehara, Gregory T.; Nakamura, Eric B.; Chu, Cecilia W. P.

1996-01-01

In his research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. This report will summarize the key progress we have made toward achieving our eventual goal of implementing a decoder system based upon this code. In the first phase of study, we investigated the complexities of various sectionalized trellis diagrams for the proposed (64, 40, 8) RM subcode. We found a specific 8-trellis diagram for this code which requires the least decoding complexity with a high possibility of achieving a decoding speed of 600 M bits per second(Mbps). The combination of a large number of states and a high data rate will be made possible due to the utilization of a high degree of parallelism throughout the architecture. This trellis diagram will be presented and briefly described. In the second phase of study which was carried out through the past year, we investigated circuit architectures to determine the feasibility of VLSI implementation of a high- speed Viterbi decoder based on this 8-section trellis diagram. We began to examine specific design and implementation approaches to implement a fully custom integrated circuit (IC) which will be a key building block for a decoder system implementation. The key results will be presented in this report. This report will be divided into three primary sections. First, we will briefly describe the system block diagram in which the proposed decoder is assumed to be operating and present some of the key architectural approaches being used to implement the system at high speed. Second, we will describe details of the 8-trellis diagram we found to best meet the trade-offs between chip and overall system complexity. The chosen approach implements the trellis for the (64, 40, 8) RM subcode with 32 independent sub-trellises. And third, we will describe results of our feasibility study on the implementation of such an IC chip in CMOS technology to implement one of these subtrellises.
Circuit Design Approaches for Implementation of a Subtrellis IC for a Reed-Muller Subcode

NASA Technical Reports Server (NTRS)

Lin, Shu; Uehara, Gregory T.; Nakamura, Eric B.; Chu, Cecilia W. P.

1996-01-01

In this research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. This report will summarize the key progress we have made toward achieving our eventual goal of implementing a decoder system based upon this code. In the first phase of study, we investigated the complexities of various sectionalized trellis diagrams for the proposed (64, 40, 8) RM subcode. We found a specific 8-trellis diagram for this code which requires the least decoding complexity with a high possibility of achieving a decoding speed of 600 M bits per second (Mbps). The combination of a large number of states and a high data rate will be made possible due to the utilization of a high degree of parallelism throughout the architecture. This trellis diagram will be presented and briefly described. In the second phase of study which was carried out through the past year, we investigated circuit architectures to determine the feasibility of VLSI implementation of a high-speed Viterbi decoder based on this 8-section trellis diagram. We began to examine specific design and implementation approaches to implement a fully custom integrated circuit (IC) which will be a key building block for a decoder system implementation. The key results will be presented in this report. This report will be divided into three primary sections. First, we will briefly describe the system block diagram in which the proposed decoder is assumed to be operating and present some of the key architectural approaches being used to implement the system at high speed. Second, we will describe details of the 8-trellis diagram we found to best meet the trade-offs between chip and overall system complexity. The chosen approach implements the trellis for the (64, 40, 8) RM subcode with 32 independent sub-trellises. And third, we will describe results of our feasibility study on the implementation of such an IC chip in CMOS technology to implement one of these subtrellises.
High-Speed Soft-Decision Decoding of Two Reed-Muller Codes

NASA Technical Reports Server (NTRS)

Lin, Shu; Uehara, Gregory T.

1996-01-01

In his research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. This report will summarize the key progress we have made toward achieving our eventual goal of implementing a decoder system based upon this code. In the first phase of study, we investigated the complexities of various sectionalized trellis diagrams for the proposed (64, 40, 8) RNI subcode. We found a specific 8-trellis diagram for this code which requires the least decoding complexity with a high possibility of achieving a decoding speed of 600 M bits per second (Mbps). The combination of a large number of states and a hi ch data rate will be made possible due to the utilization of a high degree of parallelism throughout the architecture. This trellis diagram will be presented and briefly described. In the second phase of study which was carried out through the past year, we investigated circuit architectures to determine the feasibility of VLSI implementation of a high-speed Viterbi decoder based on this 8-section trellis diagram. We began to examine specific design and implementation approaches to implement a fully custom integrated circuit (IC) which will be a key building block for a decoder system implementation. The key results will be presented in this report. This report will be divided into three primary sections. First, we will briefly describe the system block diagram in which the proposed decoder is assumed to be operating and present some of the key architectural approaches being used to implement the system at high speed. Second, we will describe details of the 8-trellis diagram we found to best meet the trade-offs between chip and overall system complexity. The chosen approach implements the trellis for the (64, 40, 8) RM subcode with 32 independent sub-trellises. And third, we will describe results of our feasibility study on the implementation of such an IC chip in CMOS technology to implement one of these sub-trellises.
High-Speed Soft-Decision Decoding of Two Reed-Muller Codes

NASA Technical Reports Server (NTRS)

Lin, Shu; Uehara, Gregory T.

1996-01-01

In this research, we have proposed the (64, 40, 8) subcode of the third-order Reed-Muller (RM) code to NASA for high-speed satellite communications. This RM subcode can be used either alone or as an inner code of a concatenated coding system with the NASA standard (255, 233, 33) Reed-Solomon (RS) code as the outer code to achieve high performance (or low bit-error rate) with reduced decoding complexity. It can also be used as a component code in a multilevel bandwidth efficient coded modulation system to achieve reliable bandwidth efficient data transmission. This report will summarize the key progress we have made toward achieving our eventual goal of implementing, a decoder system based upon this code. In the first phase of study, we investigated the complexities of various sectionalized trellis diagrams for the proposed (64, 40, 8) RM subcode. We found a specific 8-trellis diagram for this code which requires the least decoding complexity with a high possibility of achieving a decoding speed of 600 M bits per second (Mbps). The combination of a large number of states and a high data rate will be made possible due to the utilization of a high degree of parallelism throughout the architecture. This trellis diagram will be presented and briefly described. In the second phase of study, which was carried out through the past year, we investigated circuit architectures to determine the feasibility of VLSI implementation of a high-speed Viterbi decoder based on this 8-section trellis diagram. We began to examine specific design and implementation approaches to implement a fully custom integrated circuit (IC) which will be a key building block for a decoder system implementation. The key results will be presented in this report. This report will be divided into three primary sections. First, we will briefly describe the system block diagram in which the proposed decoder is assumed to be operating, and present some of the key architectural approaches being used to implement the system at high speed. Second, we will describe details of the 8-trellis diagram we found to best meet the trade-offs between chip and overall system complexity. The chosen approach implements the trellis for the (64, 40, 8) RM subcode with 32 independent sub-trellises. And third, we will describe results of our feasibility study on the implementation of such an IC chip in CMOS technology to implement one of these sub-trellises.
The language parallel Pascal and other aspects of the massively parallel processor

NASA Technical Reports Server (NTRS)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Integrated Vertical Bloch Line (VBL) memory

NASA Technical Reports Server (NTRS)

Katti, R. R.; Wu, J. C.; Stadler, H. L.

1991-01-01

Vertical Bloch Line (VBL) Memory is a recently conceived, integrated, solid state, block access, VLSI memory which offers the potential of 1 Gbit/sq cm areal storage density, data rates of hundreds of megabits/sec, and submillisecond average access time simultaneously at relatively low mass, volume, and power values when compared to alternative technologies. VBLs are micromagnetic structures within magnetic domain walls which can be manipulated using magnetic fields from integrated conductors. The presence or absence of BVL pairs are used to store binary information. At present, efforts are being directed at developing a single chip memory using 25 Mbit/sq cm technology in magnetic garnet material which integrates, at a single operating point, the writing, storage, reading, and amplification functions needed in a memory. The current design architecture, functional elements, and supercomputer simulation results are described which are used to assist the design process.
VLSI Technology: Impact and Promise. Identifying Emerging Issues and Trends in Technology for Special Education.

ERIC Educational Resources Information Center

Bayoumi, Magdy

As part of a 3-year study to identify emerging issues and trends in technology for special education, this paper addresses the implications of very large scale integrated (VLSI) technology. The first section reviews the development of educational technology, particularly microelectronics technology, from the 1950s to the present. The implications…
A Knowledge Based Approach to VLSI CAD

DTIC Science & Technology

1983-09-01

Avail-and/or Dist ISpecial L| OI. SEICURITY CLASIIrCATION OP THIS IPA.lErllm S Daene." A KNOwLEDE BASED APPROACH TO VLSI CAD’ Louis L Steinberg and...major issues lies in building up and managing the knowledge base of oesign expertise. We expect that, as with many recent expert systems, in order to
CMOS VLSI Layout and Verification of a SIMD Computer

NASA Technical Reports Server (NTRS)

Zheng, Jianqing

1996-01-01

A CMOS VLSI layout and verification of a 3 x 3 processor parallel computer has been completed. The layout was done using the MAGIC tool and the verification using HSPICE. Suggestions for expanding the computer into a million processor network are presented. Many problems that might be encountered when implementing a massively parallel computer are discussed.
Functional Abstraction from Structure in VLSI Simulation Models,

DTIC Science & Technology

1987-05-01

wide vari- ety of powerful tools, designed around the Y model proposed by Gajski and Kuhn [11]. The heart of the system is the data representation...34Fuictional Models for VLSI Design", 20th IEEE Design Automation Conference (DAC󈨗), 1983, paper 32.2, pp. 506-514. * 21 [11] Gajski , Daniel D., Kuhn, Robert H
Towards an Analogue Neuromorphic VLSI Instrument for the Sensing of Complex Odours

NASA Astrophysics Data System (ADS)

Ab Aziz, Muhammad Fazli; Harun, Fauzan Khairi Che; Covington, James A.; Gardner, Julian W.

2011-09-01

Almost all electronic nose instruments reported today employ pattern recognition algorithms written in software and run on digital processors, e.g. micro-processors, microcontrollers or FPGAs. Conversely, in this paper we describe the analogue VLSI implementation of an electronic nose through the design of a neuromorphic olfactory chip. The modelling, design and fabrication of the chip have already been reported. Here a smart interface has been designed and characterised for thisneuromorphic chip. Thus we can demonstrate the functionality of the a VLSI neuromorphic chip, producing differing principal neuron firing patterns to real sensor response data. Further work is directed towards integrating 9 separate neuromorphic chips to create a large neuronal network to solve more complex olfactory problems.
Gallium arsenide processing elements for motion estimation full-search algorithm

NASA Astrophysics Data System (ADS)

Lopez, Jose F.; Cortes, P.; Lopez, S.; Sarmiento, Roberto

2001-11-01

The Block-Matching motion estimation algorithm (BMA) is the most popular method for motion-compensated coding of image sequence. Among the several possible searching methods to compute this algorithm, the full-search BMA (FBMA) has obtained great interest from the scientific community due to its regularity, optimal solution and low control overhead which simplifies its VLSI realization. On the other hand, its main drawback is the demand of an enormous amount of computation. There are different ways of overcoming this factor, being the use of advanced technologies, such as Gallium Arsenide (GaAs), the one adopted in this article together with different techniques to reduce area overhead. By exploiting GaAs properties, improvements can be obtained in the implementation of feasible systems for real time video compression architectures. Different primitives used in the implementation of processing elements (PE) for a FBMA scheme are presented. As a result, Pes running at 270 MHz have been developed in order to study its functionality and performance. From these results, an implementation for MPEG applications is proposed, leading to an architecture running at 145 MHz with a power dissipation of 3.48 W and an area of 11.5 mm2.
NASA Tech Briefs, June 2007

NASA Technical Reports Server (NTRS)

2007-01-01

Topics covered include: High-Accuracy, High-Dynamic-Range Phase-Measurement System; Simple, Compact, Safe Impact Tester; Multi-Antenna Radar Systems for Doppler Rain Measurements; 600-GHz Electronically Tunable Vector Measurement System; Modular Architecture for the Measurement of Space Radiation; VLSI Design of a Turbo Decoder; Architecture of an Autonomous Radio Receiver; Improved On-Chip Measurement of Delay in an FPGA or ASIC; Resource Selection and Ranking; Accident/Mishap Investigation System; Simplified Identification of mRNA or DNA in Whole Cells; Printed Multi-Turn Loop Antennas for RF Biotelemetry; Making Ternary Quantum Dots From Single-Source Precursors; Improved Single-Source Precursors for Solar-Cell Absorbers; Spray CVD for Making Solar-Cell Absorber Layers; Glass/BNNT Composite for Sealing Solid Oxide Fuel Cells; A Method of Assembling Compact Coherent Fiber-Optic Bundles; Manufacturing Diamond Under Very High Pressure; Ring-Resonator/Sol-Gel Interferometric Immunosensor; Compact Fuel-Cell System Would Consume Neat Methanol; Algorithm Would Enable Robots to Solve Problems Creatively; Hypothetical Scenario Generator for Fault-Tolerant Diagnosis; Smart Data Node in the Sky; Pseudo-Waypoint Guidance for Proximity Spacecraft Maneuvers; Update on Controlling Herds of Cooperative Robots; and Simulation and Testing of Maneuvering of a Planetary Rover.

High performance genetic algorithm for VLSI circuit partitioning

NASA Astrophysics Data System (ADS)

Dinu, Simona

2016-12-01

Partitioning is one of the biggest challenges in computer-aided design for VLSI circuits (very large-scale integrated circuits). This work address the min-cut balanced circuit partitioning problem- dividing the graph that models the circuit into almost equal sized k sub-graphs while minimizing the number of edges cut i.e. minimizing the number of edges connecting the sub-graphs. The problem may be formulated as a combinatorial optimization problem. Experimental studies in the literature have shown the problem to be NP-hard and thus it is important to design an efficient heuristic algorithm to solve it. The approach proposed in this study is a parallel implementation of a genetic algorithm, namely an island model. The information exchange between the evolving subpopulations is modeled using a fuzzy controller, which determines an optimal balance between exploration and exploitation of the solution space. The results of simulations show that the proposed algorithm outperforms the standard sequential genetic algorithm both in terms of solution quality and convergence speed. As a direction for future study, this research can be further extended to incorporate local search operators which should include problem-specific knowledge. In addition, the adaptive configuration of mutation and crossover rates is another guidance for future research.
Verification of VLSI designs

NASA Technical Reports Server (NTRS)

Windley, P. J.

1991-01-01

In this paper we explore the specification and verification of VLSI designs. The paper focuses on abstract specification and verification of functionality using mathematical logic as opposed to low-level boolean equivalence verification such as that done using BDD's and Model Checking. Specification and verification, sometimes called formal methods, is one tool for increasing computer dependability in the face of an exponentially increasing testing effort.
Fault Tolerance for VLSI Multicomputers

DTIC Science & Technology

1985-08-01

that consists of hundreds or thousands of VLSI computation nodes interconnected by dedicated links. Some important applications of high-end computers...technology, and intended applications . A proposed fault tolerance scheme combines hardware that performs error detection and system-level protocols for...order to recover from the error and resume correct operation, a valid system state must be restored. A low-overhead, application -transparent error
VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

NASA Technical Reports Server (NTRS)

Fang, Wai-Chi; Lue, Jaw-Chyng

2009-01-01

A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).
Area-Efficient VLSI Computation.

DTIC Science & Technology

1981-10-01

to the bus of a computer system. 5 Table 1-2: Definition of the three- sorter . 7 Figure 1-3: A real-time systolic priority queue. 7 Figure 1-4: The...ca.pable of sorting three elements. The iree- sorter has three inputs X, Y, and Z and prduccs tlicc oitputs X’. Y’, and Z’ which are the miniumn, median. Mnd...in Section L4. Figure 1-3 shows how three- sorters are interconnected to make a systolic priority queue. In the figure, the outputs from the top. middle
Computationally Efficient Modeling and Simulation of Large Scale Systems

NASA Technical Reports Server (NTRS)

Jain, Jitesh (Inventor); Koh, Cheng-Kok (Inventor); Balakrishnan, Vankataramanan (Inventor); Cauley, Stephen F (Inventor); Li, Hong (Inventor)

2014-01-01

A system for simulating operation of a VLSI interconnect structure having capacitive and inductive coupling between nodes thereof, including a processor, and a memory, the processor configured to perform obtaining a matrix X and a matrix Y containing different combinations of passive circuit element values for the interconnect structure, the element values for each matrix including inductance L and inverse capacitance P, obtaining an adjacency matrix A associated with the interconnect structure, storing the matrices X, Y, and A in the memory, and performing numerical integration to solve first and second equations.
Noise-margin limitations on gallium-arsenide VLSI

NASA Technical Reports Server (NTRS)

Long, Stephen I.; Sundaram, Mani

1988-01-01

Two factors which limit the complexity of GaAs MESFET VLSI circuits are considered. Power dissipation sets an upper complexity limit for a given logic circuit implementation and thermal design. Uniformity of device characteristics and the circuit configuration determines the electrical functional yield. Projection of VLSI complexity based on these factors indicates that logic chips of 15,000 gates are feasible with the most promising static circuits if a maximum power dissipation of 5 W per chip is assumed. While lower power per gate and therefore more gates per chip can be obtained by using a popular E/D FET circuit, yields are shown to be small when practical device parameter tolerances are applied. Further improvements in materials, devices, and circuits wil be needed to extend circuit complexity to the range currently dominated by silicon.
VLSI 'smart' I/O module development

NASA Astrophysics Data System (ADS)

Kirk, Dan

The developmental history, design, and operation of the MIL-STD-1553A/B discrete and serial module (DSM) for the U.S. Navy AN/AYK-14(V) avionics computer are described and illustrated with diagrams. The ongoing preplanned product improvement for the AN/AYK-14(V) includes five dual-redundant MIL-STD-1553 channels based on DSMs. The DSM is a front-end processor for transferring data to and from a common memory, sharing memory with a host processor to provide improved 'smart' input/output performance. Each DSM comprises three hardware sections: three VLSI-6000 semicustomized CMOS arrays, memory units to support the arrays, and buffers and resynchronization circuits. The DSM hardware module design, VLSI-6000 design tools, controlware and test software, and checkout procedures (using a hardware simulator) are characterized in detail.
Implementation of a VLSI Level Zero Processing system utilizing the functional component approach

NASA Technical Reports Server (NTRS)

Shi, Jianfei; Horner, Ward P.; Grebowsky, Gerald J.; Chesney, James R.

1991-01-01

A high rate Level Zero Processing system is currently being prototyped at NASA/Goddard Space Flight Center (GSFC). Based on state-of-the-art VLSI technology and the functional component approach, the new system promises capabilities of handling multiple Virtual Channels and Applications with a combined data rate of up to 20 Megabits per second (Mbps) at low cost.
A Systolic VLSI Design of a Pipeline Reed-solomon Decoder

NASA Technical Reports Server (NTRS)

Shao, H. M.; Truong, T. K.; Deutsch, L. J.; Yuen, J. H.; Reed, I. S.

1984-01-01

A pipeline structure of a transform decoder similar to a systolic array was developed to decode Reed-Solomon (RS) codes. An important ingredient of this design is a modified Euclidean algorithm for computing the error locator polynomial. The computation of inverse field elements is completely avoided in this modification of Euclid's algorithm. The new decoder is regular and simple, and naturally suitable for VLSI implementation.
A VLSI design of a pipeline Reed-Solomon decoder

NASA Technical Reports Server (NTRS)

Shao, H. M.; Truong, T. K.; Deutsch, L. J.; Yuen, J. H.; Reed, I. S.

1985-01-01

A pipeline structure of a transform decoder similar to a systolic array was developed to decode Reed-Solomon (RS) codes. An important ingredient of this design is a modified Euclidean algorithm for computing the error locator polynomial. The computation of inverse field elements is completely avoided in this modification of Euclid's algorithm. The new decoder is regular and simple, and naturally suitable for VLSI implementation.
Periodically Self Restoring Redundant Systems for VLSI Based Highly Reliable Design,

DTIC Science & Technology

1984-01-01

fault tolerance technique for realizing highly reliable computer systems for critical control applications . However, VL.SI technology has imposed a...operating correctly; failed critical real time control applications . n modules are discarded from the vote. the classical "static" voted redundancy...redundant modules are failure number of InterconnecttIon3. This results In f aree. However, for applications requiring higm modular complexity because
WNN 92; Proceedings of the 3rd Workshop on Neural Networks: Academic/Industrial/NASA/Defense, Auburn Univ., AL, Feb. 10-12, 1992 and South Shore Harbour, TX, Nov. 4-6, 1992

NASA Technical Reports Server (NTRS)

Padgett, Mary L. (Editor)

1993-01-01

The present conference discusses such neural networks (NN) related topics as their current development status, NN architectures, NN learning rules, NN optimization methods, NN temporal models, NN control methods, NN pattern recognition systems and applications, biological and biomedical applications of NNs, VLSI design techniques for NNs, NN systems simulation, fuzzy logic, and genetic algorithms. Attention is given to missileborne integrated NNs, adaptive-mixture NNs, implementable learning rules, an NN simulator for travelling salesman problem solutions, similarity-based forecasting, NN control of hypersonic aircraft takeoff, NN control of the Space Shuttle Arm, an adaptive NN robot manipulator controller, a synthetic approach to digital filtering, NNs for speech analysis, adaptive spline networks, an anticipatory fuzzy logic controller, and encoding operations for fuzzy associative memories.
The effect of structural design parameters on FPGA-based feed-forward space-time trellis coding-orthogonal frequency division multiplexing channel encoders

NASA Astrophysics Data System (ADS)

Passas, Georgios; Freear, Steven; Fawcett, Darren

2010-08-01

Orthogonal frequency division multiplexing (OFDM)-based feed-forward space-time trellis code (FFSTTC) encoders can be synthesised as very high speed integrated circuit hardware description language (VHDL) designs. Evaluation of their FPGA implementation can lead to conclusions that help a designer to decide the optimum implementation, given the encoder structural parameters. VLSI architectures based on 1-bit multipliers and look-up tables (LUTs) are compared in terms of FPGA slices and block RAMs (area), as well as in terms of minimum clock period (speed). Area and speed graphs versus encoder memory order are provided for quadrature phase shift keying (QPSK) and 8 phase shift keying (8-PSK) modulation and two transmit antennas, revealing best implementation under these conditions. The effect of number of modulation bits and transmit antennas on the encoder implementation complexity is also investigated.
NASA Tech Briefs, April 2003

NASA Technical Reports Server (NTRS)

2003-01-01

Topics include: Tool for Bending a Metal Tube Precisely in a Confined Space; Multiple-Use Mechanisms for Attachment to Seat Tracks; Force-Measuring Clamps; Cellular Pressure-Actuated Joint; Block QCA Fault-Tolerant Logic Gates; Hybrid VLSI/QCA Architecture for Computing FFTs; Arrays of Carbon Nanotubes as RF Filters in Waveguides; Carbon Nanotubes as Resonators for RF Spectrum Analyzers; Software for Viewing Landsat Mosaic Images; Updated Integrated Mission Program; Software for Sharing and Management of Information; Optical-Quality Thin Polymer Membranes; Rollable Thin Shell Composite-Material Paraboloidal Mirrors; Folded Resonant Horns for Power Ultrasonic Applications; Touchdown Ball-Bearing System for Magnetic Bearings; Flux-Based Deadbeat Control of Induction-Motor Torque; Block Copolymers as Templates for Arrays of Carbon Nanotubes; Throttling Cryogen Boiloff To Control Cryostat Temperature; Collaborative Software Development Approach Used to Deliver the New Shuttle Telemetry Ground Station; Turbulence in Supercritical O2/H2 and C7H16/N2 Mixing Layers; and Time-Resolved Measurements in Optoelectronic Microbioanal.
A method for validating Rent's rule for technological and biological networks.

PubMed

Alcalde Cuesta, Fernando; González Sequeiros, Pablo; Lozano Rojo, Álvaro

2017-07-14

Rent's rule is empirical power law introduced in an effort to describe and optimize the wiring complexity of computer logic graphs. It is known that brain and neuronal networks also obey Rent's rule, which is consistent with the idea that wiring costs play a fundamental role in brain evolution and development. Here we propose a method to validate this power law for a certain range of network partitions. This method is based on the bifurcation phenomenon that appears when the network is subjected to random alterations preserving its degree distribution. It has been tested on a set of VLSI circuits and real networks, including biological and technological ones. We also analyzed the effect of different types of random alterations on the Rentian scaling in order to test the influence of the degree distribution. There are network architectures quite sensitive to these randomization procedures with significant increases in the values of the Rent exponents.
Design of a high-speed digital processing element for parallel simulation

NASA Technical Reports Server (NTRS)

Milner, E. J.; Cwynar, D. S.

1983-01-01

A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
VLSI circuits implementing computational models of neocortical circuits.

PubMed

Wijekoon, Jayawan H B; Dudek, Piotr

2012-09-15

This paper overviews the design and implementation of three neuromorphic integrated circuits developed for the COLAMN ("Novel Computing Architecture for Cognitive Systems based on the Laminar Microcircuitry of the Neocortex") project. The circuits are implemented in a standard 0.35 μm CMOS technology and include spiking and bursting neuron models, and synapses with short-term (facilitating/depressing) and long-term (STDP and dopamine-modulated STDP) dynamics. They enable execution of complex nonlinear models in accelerated-time, as compared with biology, and with low power consumption. The neural dynamics are implemented using analogue circuit techniques, with digital asynchronous event-based input and output. The circuits provide configurable hardware blocks that can be used to simulate a variety of neural networks. The paper presents experimental results obtained from the fabricated devices, and discusses the advantages and disadvantages of the analogue circuit approach to computational neural modelling. Copyright © 2012 Elsevier B.V. All rights reserved.
Possible Circuit Architectures for Molecular Nanoelectronics

NASA Astrophysics Data System (ADS)

Likharev, Konstantin

2003-03-01

Chemically-directed self-assembly of molecular devices is apparently the only feasible way to continue the fast progress of microelectronics after its Moore-Laws-based development runs into the wall of physical and economic limitations [1]. The architectures of VLSI circuits using such devices should be substantially fault-tolerant and accommodate other their features including low transconductance. The most significant feature of all promising suggested architectures is the hybridization of three technologies: advanced CMOS, simple nanowire arrays, and molecular devices self-assembling on these wires. Molecular memory arrays may have a simple structure, and their simple prototypes have already been implemented experimentally [2]. In contrast, the logic circuit development is just starting. I will describe a family of neuromorphic networks based on so-called CrossNet arrays [3] that look promising for advanced information processing, starting from fast image recognition and beyond. This architecture may combine very high density (above 10^12 functions per cm^2) and relatively high speed (100-ns-scale latency of cell-to-cell communications) at acceptable power consumption. In future, these features may allow to put an artificial analog of the human cerebral cortex, capable of processing information and (hopefully) self-evolution at 4 to 5 orders of magnitude faster than its biological prototype, on a 20x20 cm^2 silicon wafer. [1] K. Likharev, "Electronics Below 20-nm", see http://rsfq1.physics.sunysb.edu/ likharev/nano/ForMorkoc.pdf. [2] See, e.g, http://nanotechweb.org/articles/news/1/9/8/1. [3] O. Turel and K. Likharev, Int. J. of Circuit Theory and Applications 31, No.1 (2003); see http://rsfq1.physics.sunysb.edu/ likharev/nano/Preprint070102.pdf.
From neural-based object recognition toward microelectronic eyes

NASA Technical Reports Server (NTRS)

Sheu, Bing J.; Bang, Sa Hyun

1994-01-01

Engineering neural network systems are best known for their abilities to adapt to the changing characteristics of the surrounding environment by adjusting system parameter values during the learning process. Rapid advances in analog current-mode design techniques have made possible the implementation of major neural network functions in custom VLSI chips. An electrically programmable analog synapse cell with large dynamic range can be realized in a compact silicon area. New designs of the synapse cells, neurons, and analog processor are presented. A synapse cell based on Gilbert multiplier structure can perform the linear multiplication for back-propagation networks. A double differential-pair synapse cell can perform the Gaussian function for radial-basis network. The synapse cells can be biased in the strong inversion region for high-speed operation or biased in the subthreshold region for low-power operation. The voltage gain of the sigmoid-function neurons is externally adjustable which greatly facilitates the search of optimal solutions in certain networks. Various building blocks can be intelligently connected to form useful industrial applications. Efficient data communication is a key system-level design issue for large-scale networks. We also present analog neural processors based on perceptron architecture and Hopfield network for communication applications. Biologically inspired neural networks have played an important role towards the creation of powerful intelligent machines. Accuracy, limitations, and prospects of analog current-mode design of the biologically inspired vision processing chips and cellular neural network chips are key design issues.

Partial Wave Analysis of Coupled Photonic Structures

NASA Technical Reports Server (NTRS)

Fuller, Kirk A.; Smith, David D.; Curreri, Peter A. (Technical Monitor)

2002-01-01

The very high quality factors sustained by microcavity optical resonators are relevant to applications in wavelength filtering, routing, switching, modulation, and multiplexing/demultiplexing. Increases in the density of photonic elements require that attention be paid to how electromagnetic (EM) coupling modifies their optical properties. This is especially true when cavity resonances are involved, in which case, their characteristics may be fundamentally altered. Understanding the optical properties of microcavities that are near or in contact with photonic elements---such as other microcavities, nanostructures, couplers, and substrates---can be expected to advance our understanding of the roles that these structures may play in VLSI photonics, biosensors and similar device technologies. Wc present results from recent theoretical studies of the effects of inter- and intracavity coupling on optical resonances in compound spherical particles. Concentrically stratified spheres and bispheres constituted from homogeneous and stratified spheres are subjects of this investigation. A new formulation is introduced for the absorption of light in an arbitrary layer of a multilayered sphere, which is based on multiple reflections of the spherical partial waves of the Lorenz-Mie solution for scattering by a sphere. Absorption efficiencies, which can be used to profile cavity resonances and to infer fluorescence yields or the onset of nonlinear optical processes in the microcavities, are presented. Splitting of resonances in these multisphere systems is paid particular attention, and consequences for photonic device development and possible performance enhancements through carefully designed architectures that exploit EM coupling are considered.
Compilation of Abstracts of Theses Submitted by Candidates for Degrees.

DTIC Science & Technology

1986-09-30

Musitano, J.R. Fin-line Horn Antennas 118 LCDR, USNR Muth, L.R. VLSI Tutorials Through the 119 LT, USN Video -computer Courseware Implementation...Engineer Allocation 432 CPT, USA Model Kiziltan, M. Cognitive Performance Degrada- 433 LTJG, Turkish Navy tion on Sonar Operator and Tor- pedo Data...and Computer Engineering 118 VLSI TUTORIALS THROUGH THE VIDEO -COMPUTER COURSEWARE IMPLEMENTATION SYSTEM Liesel R. Muth Lieutenant, United States Navy
Performance, Resources, and Complexity: A Systematic Approach to Microarchitectural Design

DTIC Science & Technology

1989-05-01

Approved: ********************************** Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of...Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT VLSI design in general -- microprocessor design in particular...has been treated more like an art than a science in the past. The goal of this thesis is to explain the science of VLSI design to someone who wants
VLSI Based Multiprocessor Communications Networks.

DTIC Science & Technology

1982-09-01

year of the contract. Research plans for year three are also presented. Need for a research effort in the area of VLSI based communication networks... plans for year three of the contract. Section 4 concludes with a summary discussion of the research thus far. A number of appendices follow the main...pin constraints. We plan to investigate some -12- of these issues during the coming year in addition to developing similar models and bandwidth
Using Ant Colony Optimization for Routing in VLSI Chips

NASA Astrophysics Data System (ADS)

Arora, Tamanna; Moses, Melanie

2009-04-01

Rapid advances in VLSI technology have increased the number of transistors that fit on a single chip to about two billion. A frequent problem in the design of such high performance and high density VLSI layouts is that of routing wires that connect such large numbers of components. Most wire-routing problems are computationally hard. The quality of any routing algorithm is judged by the extent to which it satisfies routing constraints and design objectives. Some of the broader design objectives include minimizing total routed wire length, and minimizing total capacitance induced in the chip, both of which serve to minimize power consumed by the chip. Ant Colony Optimization algorithms (ACO) provide a multi-agent framework for combinatorial optimization by combining memory, stochastic decision and strategies of collective and distributed learning by ant-like agents. This paper applies ACO to the NP-hard problem of finding optimal routes for interconnect routing on VLSI chips. The constraints on interconnect routing are used by ants as heuristics which guide their search process. We found that ACO algorithms were able to successfully incorporate multiple constraints and route interconnects on suite of benchmark chips. On an average, the algorithm routed with total wire length 5.5% less than other established routing algorithms.
System considerations for efficient communication and storage of MSTI image data

NASA Technical Reports Server (NTRS)

Rice, Robert F.

1994-01-01

The Ballistic Missile Defense Organization has been developing the capability to evaluate one or more high-rate sensor/hardware combinations by incorporating them as payloads on a series of Miniature Seeker Technology Insertion (MSTI) flights. This publication represents the final report of a 1993 study to analyze the potential impact f data compression and of related communication system technologies on post-MSTI 3 flights. Lossless compression is considered alone and in conjunction with various spatial editing modes. Additionally, JPEG and Fractal algorithms are examined in order to bound the potential gains from the use of lossy compression. but lossless compression is clearly shown to better fit the goals of the MSTI investigations. Lossless compression factors of between 2:1 and 6:1 would provide significant benefits to both on-board mass memory and the downlink. for on-board mass memory, the savings could range from $5 million to $9 million. Such benefits should be possible by direct application of recently developed NASA VLSI microcircuits. It is shown that further downlink enhancements of 2:1 to 3:1 should be feasible thorough use of practical modifications to the existing modulation system and incorporation of Reed-Solomon channel coding. The latter enhancement could also be achieved by applying recently developed VLSI microcircuits.
Exact Algorithms for Output Encoding, State Assignment and Four-Level Boolean Minimization

DTIC Science & Technology

1989-10-01

APPROVED FOR PUBLIC DISTRIBUTION • DTIC MASSACHUSETTS INTITUTE OF TECHNOLOGY M VLSI PUBLICATIONSJAN 17 1990 VLSI Memo No. 89-569 JN. 9October 1989...nunijize large funclions exacly within reasonable amocunt. of CPt targeting twro-level logic imnplemientations involve finding ap- time. However, thle ,, m ...0(NV!) m ~iimizations . n5 10 The inptut encoding problemt can be exactly solved using mrultiple-valued Boolean nimuization. We present an exact (a) (b
The VLSI design of a Reed-Solomon encoder using Berlekamps bit-serial multiplier algorithm

NASA Technical Reports Server (NTRS)

Truong, T. K.; Deutsch, L. J.; Reed, I. S.; Hsu, I. S.; Wang, K.; Yeh, C. S.

1982-01-01

Realization of a bit-serial multiplication algorithm for the encoding of Reed-Solomon (RS) codes on a single VLSI chip using NMOS technology is demonstrated to be feasible. A dual basis (255, 223) over a Galois field is used. The conventional RS encoder for long codes ofter requires look-up tables to perform the multiplication of two field elements. Berlekamp's algorithm requires only shifting and exclusive-OR operations.
Leak detection utilizing analog binaural (VLSI) techniques

NASA Technical Reports Server (NTRS)

Hartley, Frank T. (Inventor)

1995-01-01

A detection method and system utilizing silicon models of the traveling wave structure of the human cochlea to spatially and temporally locate a specific sound source in the presence of high noise pandemonium. The detection system combines two-dimensional stereausis representations, which are output by at least three VLSI binaural hearing chips, to generate a three-dimensional stereausis representation including both binaural and spectral information which is then used to locate the sound source.
Devices and Systems for Nonlinear Optical Information Processing

DTIC Science & Technology

1988-11-01

in the VLSI literature [7, 8, 9], in which basic physical principles have been invoked to both understand current VLSI performance and to project...the first time, that in fact accounts for a very wide range of observed but previously unexplained phenomena [Appendix 4; AFOSR Jour. Publ. 7, AFOSR...the variable grating mode liquid crystal device A. R. Tongay. Jr. Abstract. The physical principles of operation of the variable grating mode C. S. Wu
Tunable multi-wavelength fiber lasers based on an Opto-VLSI processor and optical amplifiers.

PubMed

Xiao, Feng; Alameh, Kamal; Lee, Yong Tak

2009-12-07

A multi-wavelength tunable fiber laser based on the use of an Opto-VLSI processor in conjunction with different optical amplifiers is proposed and experimentally demonstrated. The Opto-VLSI processor can simultaneously select any part of the gain spectrum from each optical amplifier into its associated fiber ring, leading to a multiport tunable fiber laser source. We experimentally demonstrate a 3-port tunable fiber laser source, where each output wavelength of each port can independently be tuned within the C-band with a wavelength step of about 0.05 nm. Experimental results demonstrate a laser linewidth as narrow as 0.05 nm and an optical side-mode-suppression-ratio (SMSR) of about 35 dB. The demonstrated three fiber lasers have excellent stability at room temperature and output power uniformity less than 0.5 dB over the whole C-band.
A cost-effective methodology for the design of massively-parallel VLSI functional units

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Sriram, G.; Desouza, J.

1993-01-01

In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
Random noise effects in pulse-mode digital multilayer neural networks.

PubMed

Kim, Y C; Shanblatt, M A

1995-01-01

A pulse-mode digital multilayer neural network (DMNN) based on stochastic computing techniques is implemented with simple logic gates as basic computing elements. The pulse-mode signal representation and the use of simple logic gates for neural operations lead to a massively parallel yet compact and flexible network architecture, well suited for VLSI implementation. Algebraic neural operations are replaced by stochastic processes using pseudorandom pulse sequences. The distributions of the results from the stochastic processes are approximated using the hypergeometric distribution. Synaptic weights and neuron states are represented as probabilities and estimated as average pulse occurrence rates in corresponding pulse sequences. A statistical model of the noise (error) is developed to estimate the relative accuracy associated with stochastic computing in terms of mean and variance. Computational differences are then explained by comparison to deterministic neural computations. DMNN feedforward architectures are modeled in VHDL using character recognition problems as testbeds. Computational accuracy is analyzed, and the results of the statistical model are compared with the actual simulation results. Experiments show that the calculations performed in the DMNN are more accurate than those anticipated when Bernoulli sequences are assumed, as is common in the literature. Furthermore, the statistical model successfully predicts the accuracy of the operations performed in the DMNN.
Right-Brain/Left-Brain Integrated Associative Processor Employing Convertible Multiple-Instruction-Stream Multiple-Data-Stream Elements

NASA Astrophysics Data System (ADS)

Hayakawa, Hitoshi; Ogawa, Makoto; Shibata, Tadashi

2005-04-01

A very large scale integrated circuit (VLSI) architecture for a multiple-instruction-stream multiple-data-stream (MIMD) associative processor has been proposed. The processor employs an architecture that enables seamless switching from associative operations to arithmetic operations. The MIMD element is convertible to a regular central processing unit (CPU) while maintaining its high performance as an associative processor. Therefore, the MIMD associative processor can perform not only on-chip perception, i.e., searching for the vector most similar to an input vector throughout the on-chip cache memory, but also arithmetic and logic operations similar to those in ordinary CPUs, both simultaneously in parallel processing. Three key technologies have been developed to generate the MIMD element: associative-operation-and-arithmetic-operation switchable calculation units, a versatile register control scheme within the MIMD element for flexible operations, and a short instruction set for minimizing the memory size for program storage. Key circuit blocks were designed and fabricated using 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. As a result, the full-featured MIMD element is estimated to be 3 mm2, showing the feasibility of an 8-parallel-MIMD-element associative processor in a single chip of 5 mm× 5 mm.
Efficient Interconnection Schemes for VLSI and Parallel Computation

DTIC Science & Technology

1989-08-01

Definition: Let R be a routing network. A set S of wires in R is a (directed) cut if it partitions the network into two sets of processors A and B ...such that every path from a processor in A to a processor in B contains a wire in S. The capacity cap(S) is the number of wires in the cut. For a set of...messages M, define the load load(M, S) of M on a cut S to be the number of messages in M from a processor in A to a processor in B . The load factor
Molecular implementation of molecular shift register memories

NASA Technical Reports Server (NTRS)

Beratan, David N. (Inventor); Onuchic, Jose N. (Inventor)

1991-01-01

An electronic shift register memory (20) at the molecular level is described. The memory elements are based on a chain of electron transfer molecules (22) and the information is shifted by photoinduced (26) electron transfer reactions. Thus, multi-step sequences of charge transfer reactions are used to move charge with high efficiency down a molecular chain. The device integrates compositions of the invention onto a VLSI substrate (36), providing an example of a molecular electronic device which may be fabricated. Three energy level schemes, molecular implementation of these schemes, optical excitation strategies, charge amplification strategies, and error correction strategies are described.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

DTIC Science & Technology

2017-10-04

Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Critical Problems in Very Large Scale Computer Systems

DTIC Science & Technology

1989-03-31

253-6043 Srinivas Devadas (617) 253-0454 Thomas F. Knight, Jr. (617) 253-7807 F. Thomson Leighton (617) 253-3662 Charles E. Leiserson (617) 253-5833...VLSI Memo No. 88-477, October 1988. S. Devadas , "General Decomposition of Sequential Machines: Relationships to State Assignment," to appear in...Perspective, C. Hewitt and G. Agha editors, MIT Press, 1989. Also MIT VLSI Memo No. 88-491, December 1988. * T. Leighton, B . Maggs, and S. Rao, "Universal
Princeton VLSI Project: Semi-Annual Report.

DTIC Science & Technology

1982-11-01

already fully defined the new language and implementation is now under way o [7]. AMl differs from AU in two essential ways. First, it is based on...Our main thesis is that the VLSI design task can be profitably thought of as a progremmiW task, as opposed to a geometric editing task. We believe...S. Thesis , MIT, EECS Department, June, 1980. [4] Batali, J., Mayle, N., Shrobe, H., Sussman, G., Weise, D., "The DPL/Daedalus Design Environment
New dynamic FET logic and serial memory circuits for VLSI GaAs technology

NASA Technical Reports Server (NTRS)

Eldin, A. G.

1991-01-01

The complexity of GaAs field effect transistor (FET) very large scale integration (VLSI) circuits is limited by the maximum power dissipation while the uniformity of the device parameters determines the functional yield. In this work, digital GaAs FET circuits are presented that eliminate the DC power dissipation and reduce the area to 50% of that of the conventional static circuits. Its larger tolerance to device parameter variations results in higher functional yield.

VLSI Research

DTIC Science & Technology

1984-04-01

Ousterhout, G.T. Hamachi, R.N. Mayo, W.S. Scott, and G.S. Taylor , "A Collection of Papers on Magic," Technical Report No. UCB/CSD 83/154, Computer Science...Division, University of California, Berkeley, December 1983. (3) J.K Ousterhout, G.T. Hamachi, R.N. Mayo, W.S. Scott, and G.S. Taylor , "Magic: A...VLSI Layout System." to appear, Slst Design Automation Confer- ence, June 1984. (4) G.S. Taylor and J.K Ousterhout, "Magic’s Incremental Design-Rule
High density circuit technology, part 3

NASA Technical Reports Server (NTRS)

Wade, T. E.

1982-01-01

Dry processing - both etching and deposition - and present/future trends in semiconductor technology are discussed. In addition to a description of the basic apparatus, terminology, advantages, glow discharge phenomena, gas-surface chemistries, and key operational parameters for both dry etching and plasma deposition processes, a comprehensive survey of dry processing equipment (via vendor listing) is also included. The following topics are also discussed: fine-line photolithography, low-temperature processing, packaging for dense VLSI die, the role of integrated optics, and VLSI and technology innovations.
Recent patents on Cu/low-k dielectrics interconnects in integrated circuits.

PubMed

Jiang, Qing; Zhu, Yong F; Zhao, Ming

2007-01-01

In past decades, the development of microelectronics has moved along with constant speed of scaling to maximize transistor density as driven by the need for electrical and functional performance. For further development, the propagation velocity of electromagnetic waves becomes increasingly important due to their unyielding constraints on interconnect delay. To minimize it, it was forced to the introduction of the Cu/low-k dielectric interconnects to very large scale integrated circuits (VLSI) where k denotes the dielectric constant. In addition, reliable barrier structures, which are the thinnest part among the device parts to maximize space availability for the actual Cu IWs, are required to prevent penetration of different materials. In light of the above statements, this review will focus recent patents and some studies on Cu interconnects including Cu interconnect wires, low-k dielectrics and related barrier materials as well manufacturing techniques in VLSI, which are one of the most essential concerns in microelectronic industry and decides the further development of VLSI. In addition, possible future development in this field is considered.
Extended Logic Intelligent Processing System for a Sensor Fusion Processor Hardware

NASA Technical Reports Server (NTRS)

Stoica, Adrian; Thomas, Tyson; Li, Wei-Te; Daud, Taher; Fabunmi, James

2000-01-01

The paper presents the hardware implementation and initial tests from a low-power, highspeed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) is described, which combines rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor signals in compact low power VLSI. The development of the ELIPS concept is being done to demonstrate the interceptor functionality which particularly underlines the high speed and low power requirements. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Processing speeds of microseconds have been demonstrated using our test hardware.
Transistor analogs of emergent iono-neuronal dynamics.

PubMed

Rachmuth, Guy; Poon, Chi-Sang

2008-06-01

Neuromorphic analog metal-oxide-silicon (MOS) transistor circuits promise compact, low-power, and high-speed emulations of iono-neuronal dynamics orders-of-magnitude faster than digital simulation. However, their inherently limited input voltage dynamic range vs power consumption and silicon die area tradeoffs makes them highly sensitive to transistor mismatch due to fabrication inaccuracy, device noise, and other nonidealities. This limitation precludes robust analog very-large-scale-integration (aVLSI) circuits implementation of emergent iono-neuronal dynamics computations beyond simple spiking with limited ion channel dynamics. Here we present versatile neuromorphic analog building-block circuits that afford near-maximum voltage dynamic range operating within the low-power MOS transistor weak-inversion regime which is ideal for aVLSI implementation or implantable biomimetic device applications. The fabricated microchip allowed robust realization of dynamic iono-neuronal computations such as coincidence detection of presynaptic spikes or pre- and postsynaptic activities. As a critical performance benchmark, the high-speed and highly interactive iono-neuronal simulation capability on-chip enabled our prompt discovery of a minimal model of chaotic pacemaker bursting, an emergent iono-neuronal behavior of fundamental biological significance which has hitherto defied experimental testing or computational exploration via conventional digital or analog simulations. These compact and power-efficient transistor analogs of emergent iono-neuronal dynamics open new avenues for next-generation neuromorphic, neuroprosthetic, and brain-machine interface applications.
A Review of Current Neuromorphic Approaches for Vision, Auditory, and Olfactory Sensors.

PubMed

Vanarse, Anup; Osseiran, Adam; Rassau, Alexander

2016-01-01

Conventional vision, auditory, and olfactory sensors generate large volumes of redundant data and as a result tend to consume excessive power. To address these shortcomings, neuromorphic sensors have been developed. These sensors mimic the neuro-biological architecture of sensory organs using aVLSI (analog Very Large Scale Integration) and generate asynchronous spiking output that represents sensing information in ways that are similar to neural signals. This allows for much lower power consumption due to an ability to extract useful sensory information from sparse captured data. The foundation for research in neuromorphic sensors was laid more than two decades ago, but recent developments in understanding of biological sensing and advanced electronics, have stimulated research on sophisticated neuromorphic sensors that provide numerous advantages over conventional sensors. In this paper, we review the current state-of-the-art in neuromorphic implementation of vision, auditory, and olfactory sensors and identify key contributions across these fields. Bringing together these key contributions we suggest a future research direction for further development of the neuromorphic sensing field.
NASA Tech Briefs, March 2003

NASA Technical Reports Server (NTRS)

2003-01-01

Topics covered include: Tool for Bending a Metal Tube Precisely in a Confined Space; Multiple-Use Mechanisms for Attachment to Seat Tracks; Force-Measuring Clamps; Cellular Pressure-Actuated Joint; Block QCA Fault-Tolerant Logic Gates; Hybrid VLSI/QCA Architecture for Computing FFTs; Arrays of Carbon Nanotubes as RF Filters in Waveguides; Carbon Nanotubes as Resonators for RF Spectrum Analyzers; Software for Viewing Landsat Mosaic Images; Updated Integrated Mission Program; Software for Sharing and Management of Information; Update on Integrated Optical Design Analyzer; Optical-Quality Thin Polymer Membranes; Rollable Thin Shell Composite-Material Paraboloidal Mirrors; Folded Resonant Horns for Power Ultrasonic Applications; Touchdown Ball-Bearing System for Magnetic Bearings; Flux-Based Deadbeat Control of Induction-Motor Torque; Block Copolymers as Templates for Arrays of Carbon Nanotubes; Throttling Cryogen Boiloff To Control Cryostat Temperature; Collaborative Software Development Approach Used to Deliver the New Shuttle Telemetry Ground Station; Turbulence in Supercritical O2/H2 and C7H16/N2 Mixing Layers; and Time-Resolved Measurements in Optoelectronic Microbioanal.
Neural networks for data compression and invariant image recognition

NASA Technical Reports Server (NTRS)

Gardner, Sheldon

1989-01-01

An approach to invariant image recognition (I2R), based upon a model of biological vision in the mammalian visual system (MVS), is described. The complete I2R model incorporates several biologically inspired features: exponential mapping of retinal images, Gabor spatial filtering, and a neural network associative memory. In the I2R model, exponentially mapped retinal images are filtered by a hierarchical set of Gabor spatial filters (GSF) which provide compression of the information contained within a pixel-based image. A neural network associative memory (AM) is used to process the GSF coded images. We describe a 1-D shape function method for coding of scale and rotationally invariant shape information. This method reduces image shape information to a periodic waveform suitable for coding as an input vector to a neural network AM. The shape function method is suitable for near term applications on conventional computing architectures equipped with VLSI FFT chips to provide a rapid image search capability.
Fault tolerant, radiation hard, high performance digital signal processor

NASA Technical Reports Server (NTRS)

Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke

1990-01-01

An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.
Test aspects of the JPL Viterbi decoder

NASA Technical Reports Server (NTRS)

Breuer, M. A.

1989-01-01

The generation of test vectors and design-for-test aspects of the Jet Propulsion Laboratory (JPL) Very Large Scale Integration (VLSI) Viterbi decoder chip is discussed. Each processor integrated circuit (IC) contains over 20,000 gates. To achieve a high degree of testability, a scan architecture is employed. The logic has been partitioned so that very few test vectors are required to test the entire chip. In addition, since several blocks of logic are replicated numerous times on this chip, test vectors need only be generated for each block, rather than for the entire circuit. These unique blocks of logic have been identified and test sets generated for them. The approach employed for testing was to use pseudo-exhaustive test vectors whenever feasible. That is, each cone of logid is tested exhaustively. Using this approach, no detailed logic design or fault model is required. All faults which modify the function of a block of combinational logic are detected, such as all irredundant single and multiple stuck-at faults.
FPGA design of correlation-based pattern recognition

NASA Astrophysics Data System (ADS)

Jridi, Maher; Alfalou, Ayman

2017-05-01

Optical/Digital pattern recognition and tracking based on optical/digital correlation are a well-known techniques to detect, identify and localize a target object in a scene. Despite the limited number of treatments required by the correlation scheme, computational time and resources are relatively high. The most computational intensive treatment required by the correlation is the transformation from spatial to spectral domain and then from spectral to spatial domain. Furthermore, these transformations are used on optical/digital encryption schemes like the double random phase encryption (DRPE). In this paper, we present a VLSI architecture for the correlation scheme based on the fast Fourier transform (FFT). One interesting feature of the proposed scheme is its ability to stream image processing in order to perform correlation for video sequences. A trade-off between the hardware consumption and the robustness of the correlation can be made in order to understand the limitations of the correlation implementation in reconfigurable and portable platforms. Experimental results obtained from HDL simulations and FPGA prototype have demonstrated the advantages of the proposed scheme.
A new VLSI complex integer multiplier which uses a quadratic-polynomial residue system with Fermat numbers

NASA Technical Reports Server (NTRS)

Truong, T. K.; Hsu, I. S.; Chang, J. J.; Shyu, H. C.; Reed, I. S.

1986-01-01

A quadratic-polynomial Fermat residue number system (QFNS) has been used to compute complex integer multiplications. The advantage of such a QFNS is that a complex integer multiplication requires only two integer multiplications. In this article, a new type Fermat number multiplier is developed which eliminates the initialization condition of the previous method. It is shown that the new complex multiplier can be implemented on a single VLSI chip. Such a chip is designed and fabricated in CMOS-pw technology.
A new VLSI complex integer multiplier which uses a quadratic-polynomial residue system with Fermat numbers

NASA Technical Reports Server (NTRS)

Shyu, H. C.; Reed, I. S.; Truong, T. K.; Hsu, I. S.; Chang, J. J.

1987-01-01

A quadratic-polynomial Fermat residue number system (QFNS) has been used to compute complex integer multiplications. The advantage of such a QFNS is that a complex integer multiplication requires only two integer multiplications. In this article, a new type Fermat number multiplier is developed which eliminates the initialization condition of the previous method. It is shown that the new complex multiplier can be implemented on a single VLSI chip. Such a chip is designed and fabricated in CMOS-Pw technology.
Real-Time Reed-Solomon Decoder

NASA Technical Reports Server (NTRS)

Maki, Gary K.; Cameron, Kelly B.; Owsley, Patrick A.

1994-01-01

Generic Reed-Solomon decoder fast enough to correct errors in real time in practical applications designed to be implemented in fewer and smaller very-large-scale integrated, VLSI, circuit chips. Configured to operate in pipelined manner. One outstanding aspect of decoder design is that Euclid multiplier and divider modules contain Galoisfield multipliers configured as combinational-logic cells. Operates at speeds greater than older multipliers. Cellular configuration highly regular and requires little interconnection area, making it ideal for implementation in extraordinarily dense VLSI circuitry. Flight electronics single chip version of this technology implemented and available.
A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses.

PubMed

Qiao, Ning; Mostafa, Hesham; Corradi, Federico; Osswald, Marc; Stefanini, Fabio; Sumislawska, Dora; Indiveri, Giacomo

2015-01-01

Implementing compact, low-power artificial neural processing systems with real-time on-line learning abilities is still an open challenge. In this paper we present a full-custom mixed-signal VLSI device with neuromorphic learning circuits that emulate the biophysics of real spiking neurons and dynamic synapses for exploring the properties of computational neuroscience models and for building brain-inspired computing systems. The proposed architecture allows the on-chip configuration of a wide range of network connectivities, including recurrent and deep networks, with short-term and long-term plasticity. The device comprises 128 K analog synapse and 256 neuron circuits with biologically plausible dynamics and bi-stable spike-based plasticity mechanisms that endow it with on-line learning abilities. In addition to the analog circuits, the device comprises also asynchronous digital logic circuits for setting different synapse and neuron properties as well as different network configurations. This prototype device, fabricated using a 180 nm 1P6M CMOS process, occupies an area of 51.4 mm(2), and consumes approximately 4 mW for typical experiments, for example involving attractor networks. Here we describe the details of the overall architecture and of the individual circuits and present experimental results that showcase its potential. By supporting a wide range of cortical-like computational modules comprising plasticity mechanisms, this device will enable the realization of intelligent autonomous systems with on-line learning capabilities.
Connecting the Brain to Itself through an Emulation

PubMed Central

Serruya, Mijail D.

2017-01-01

Pilot clinical trials of human patients implanted with devices that can chronically record and stimulate ensembles of hundreds to thousands of individual neurons offer the possibility of expanding the substrate of cognition. Parallel trains of firing rate activity can be delivered in real-time to an array of intermediate external modules that in turn can trigger parallel trains of stimulation back into the brain. These modules may be built in software, VLSI firmware, or biological tissue as in vitro culture preparations or in vivo ectopic construct organoids. Arrays of modules can be constructed as early stage whole brain emulators, following canonical intra- and inter-regional circuits. By using machine learning algorithms and classic tasks known to activate quasi-orthogonal functional connectivity patterns, bedside testing can rapidly identify ensemble tuning properties and in turn cycle through a sequence of external module architectures to explore which can causatively alter perception and behavior. Whole brain emulation both (1) serves to augment human neural function, compensating for disease and injury as an auxiliary parallel system, and (2) has its independent operation bootstrapped by a human-in-the-loop to identify optimal micro- and macro-architectures, update synaptic weights, and entrain behaviors. In this manner, closed-loop brain-computer interface pilot clinical trials can advance strong artificial intelligence development and forge new therapies to restore independence in children and adults with neurological conditions. PMID:28713235
Testing interconnected VLSI circuits in the Big Viterbi Decoder

NASA Technical Reports Server (NTRS)

Onyszchuk, I. M.

1991-01-01

The Big Viterbi Decoder (BVD) is a powerful error-correcting hardware device for the Deep Space Network (DSN), in support of the Galileo and Comet Rendezvous Asteroid Flyby (CRAF)/Cassini Missions. Recently, a prototype was completed and run successfully at 400,000 or more decoded bits per second. This prototype is a complex digital system whose core arithmetic unit consists of 256 identical very large scale integration (VLSI) gate-array chips, 16 on each of 16 identical boards which are connected through a 28-layer, printed-circuit backplane using 4416 wires. Special techniques were developed for debugging, testing, and locating faults inside individual chips, on boards, and within the entire decoder. The methods are based upon hierarchical structure in the decoder, and require that chips or boards be wired themselves as Viterbi decoders. The basic procedure consists of sending a small set of known, very noisy channel symbols through a decoder, and matching observables against values computed by a software simulation. Also, tests were devised for finding open and short-circuited wires which connect VLSI chips on the boards and through the backplane.
GaAs VLSI for aerospace electronics

NASA Technical Reports Server (NTRS)

Larue, G.; Chan, P.

1990-01-01

Advanced aerospace electronics systems require high-speed, low-power, radiation-hard, digital components for signal processing, control, and communication applications. GaAs VLSI devices provide a number of advantages over silicon devices including higher carrier velocities, ability to integrate with high performance optical devices, and high-resistivity substrates that provide very short gate delays, good isolation, and tolerance to many forms of radiation. However, III-V technologies also have disadvantages, such as lower yield compared to silicon MOS technology. Achieving very large scale integration (VLSI) is particularly important for fast complex systems. At very short gate delays (less than 100 ps), chip-to-chip interconnects severely degrade circuit clock rates. Complex systems, therefore, benefit greatly when as many gates as possible are placed on a single chip. To fully exploit the advantages of GaAs circuits, attention must be focused on achieving high integration levels by reducing power dissipation, reducing the number of devices per logic function, and providing circuit designs that are more tolerant to process and environmental variations. In addition, adequate noise margin must be maintained to ensure a practical yield.
Controlling state explosion during automatic verification of delay-insensitive and delay-constrained VLSI systems using the POM verifier

NASA Technical Reports Server (NTRS)

Probst, D.; Jensen, L.

1991-01-01

Delay-insensitive VLSI systems have a certain appeal on the ground due to difficulties with clocks; they are even more attractive in space. We answer the question, is it possible to control state explosion arising from various sources during automatic verification (model checking) of delay-insensitive systems? State explosion due to concurrency is handled by introducing a partial-order representation for systems, and defining system correctness as a simple relation between two partial orders on the same set of system events (a graph problem). State explosion due to nondeterminism (chiefly arbitration) is handled when the system to be verified has a clean, finite recurrence structure. Backwards branching is a further optimization. The heart of this approach is the ability, during model checking, to discover a compact finite presentation of the verified system without prior composition of system components. The fully-implemented POM verification system has polynomial space and time performance on traditional asynchronous-circuit benchmarks that are exponential in space and time for other verification systems. We also sketch the generalization of this approach to handle delay-constrained VLSI systems.
The test of VLSI circuits

NASA Astrophysics Data System (ADS)

Baviere, Ph.

Tests which have proven effective for evaluating VLSI circuits for space applications are described. It is recommended that circuits be examined after each manfacturing step to gain fast feedback on inadequacies in the production system. Data from failure modes which occur during operational lifetimes of circuits also permit redefinition of the manufacturing and quality control process to eliminate the defects identified. Other tests include determination of the operational envelope of the circuits, examination of the circuit response to controlled inputs, and the performance and functional speeds of ROM and RAM memories. Finally, it is desirable that all new circuits be designed with testing in mind.

A bioinspired collision detection algorithm for VLSI implementation

NASA Astrophysics Data System (ADS)

Cuadri, J.; Linan, G.; Stafford, R.; Keil, M. S.; Roca, E.

2005-06-01

In this paper a bioinspired algorithm for collision detection is proposed, based on previous models of the locust (Locusta migratoria) visual system reported by F.C. Rind and her group, in the University of Newcastle-upon-Tyne. The algorithm is suitable for VLSI implementation in standard CMOS technologies as a system-on-chip for automotive applications. The working principle of the algorithm is to process a video stream that represents the current scenario, and to fire an alarm whenever an object approaches on a collision course. Moreover, it establishes a scale of warning states, from no danger to collision alarm, depending on the activity detected in the current scenario. In the worst case, the minimum time before collision at which the model fires the collision alarm is 40 msec (1 frame before, at 25 frames per second). Since the average time to successfully fire an airbag system is 2 msec, even in the worst case, this algorithm would be very helpful to more efficiently arm the airbag system, or even take some kind of collision avoidance countermeasures. Furthermore, two additional modules have been included: a "Topological Feature Estimator" and an "Attention Focusing Algorithm". The former takes into account the shape of the approaching object to decide whether it is a person, a road line or a car. This helps to take more adequate countermeasures and to filter false alarms. The latter centres the processing power into the most active zones of the input frame, thus saving memory and processing time resources.
Mixed-mode VLSI optic flow sensors for micro air vehicles

NASA Astrophysics Data System (ADS)

Barrows, Geoffrey Louis

We develop practical, compact optic flow sensors. To achieve the desired weight of 1--2 grams, mixed-mode and mixed-signal VLSI techniques are used to develop compact circuits that directly perform computations necessary to measure optic flow. We discuss several implementations, including a version fully integrated in VLSI, and several "hybrid sensors" in which the front end processing is performed with an analog chip and the back end processing is performed with a microcontroller. We extensively discuss one-dimensional optic flow sensors based on the linear competitive feature tracker (LCFT) algorithm. Hardware implementations of this algorithm are shown able to measure visual motion with contrast levels on the order of several percent. We argue that the development of one-dimensional optic flow sensors is therefore reduced to a problem of engineering. We also introduce two related two-dimensional optic flow algorithms that are amenable to implementation in VLSI. This includes the planar competitive feature tracker (PCFT) algorithm and the trajectory method. These sensors are being developed to solve small-scale navigation problems in micro air vehicles, which are autonomous aircraft whose maximum dimension is on the order of 15 cm. We obtain a proof-of-principle of small-scale navigation by mounting a prototype sensor onto a toy glider and programming the sensor to control a rudder or an elevator to affect the glider's path during flight. We demonstrate the determination of altitude by measuring optic flow in the downward direction. We also demonstrate steering to avoid a collision with a wall, when the glider is tossed towards the wall at a shallow angle, by measuring the optic flow in the direction of the glider's left and right side.
Learning Methods for Efficient Adoption of Contemporary Technologies in Architectural Design

ERIC Educational Resources Information Center

Mahdavinejad, Mohammadjavad; Dehghani, Sohaib; Shahsavari, Fatemeh

2013-01-01

The interaction between technology and history is one of the most significant issues in achieving an efficient and progressive architecture in any era. This is a concept which stems from lesson of traditional architecture of Iran. Architecture as a part of art, has permanently been transforming just like a living organism. In fact, it has been…
Next generation information communication infrastructure and case studies for future power systems

NASA Astrophysics Data System (ADS)

Qiu, Bin

As power industry enters the new century, powerful driving forces, uncertainties and new functions are compelling electric utilities to make dramatic changes in their information communication infrastructure. Expanding network services such as real time measurement and monitoring are also driving the need for more bandwidth in the communication network. These needs will grow further as new remote real-time protection and control applications become more feasible and pervasive. This dissertation addresses two main issues for the future power system information infrastructure: communication network infrastructure and associated power system applications. Optical networks no doubt will become the predominant data transmission media for next generation power system communication. The rapid development of fiber optic network technology poses new challenges in the areas of topology design, network management and real time applications. Based on advanced fiber optic technologies, an all-fiber network is investigated and proposed. The study will cover the system architecture and data exchange protocol aspects. High bandwidth, robust optical networks could provide great opportunities to the power system for better service and efficient operation. In the dissertation, different applications are investigated. One of the typical applications is the SCADA information accessing system. An Internet-based application for the substation automation system will be presented. VLSI (Very Large Scale Integration) technology is also used for one-line diagrams auto-generation. High transition rate and low latency optical network is especially suitable for power system real time control. In the dissertation, a new local area network based Load Shedding Controller (LSC) for isolated power system will be presented. By using PMU (Phasor Measurement Unit) and fiber optic network, an AGE (Area Generation Error) based accurate wide area load shedding scheme will also be proposed. The objective is to shed the load in the limited area with minimum disturbance.
An efficient micro control unit with a reconfigurable filter design for wireless body sensor networks (WBSNs).

PubMed

Chen, Chiung-An; Chen, Shih-Lun; Huang, Hong-Yi; Luo, Ching-Hsing

2012-11-22

In this paper, a low-cost, low-power and high performance micro control unit (MCU) core is proposed for wireless body sensor networks (WBSNs). It consists of an asynchronous interface, a register bank, a reconfigurable filter, a slop-feature forecast, a lossless data encoder, an error correct coding (ECC) encoder, a UART interface, a power management (PWM), and a multi-sensor controller. To improve the system performance and expansion abilities, the asynchronous interface is added for handling signal exchanges between different clock domains. To eliminate the noise of various bio-signals, the reconfigurable filter is created to provide the functions of average, binomial and sharpen filters. The slop-feature forecast and the lossless data encoder is proposed to reduce the data of various biomedical signals for transmission. Furthermore, the ECC encoder is added to improve the reliability for the wireless transmission and the UART interface is employed the proposed design to be compatible with wireless devices. For long-term healthcare monitoring application, a power management technique is developed for reducing the power consumption of the WBSN system. In addition, the proposed design can be operated with four different bio-sensors simultaneously. The proposed design was successfully tested with a FPGA verification board. The VLSI architecture of this work contains 7.67-K gate counts and consumes the power of 5.8 mW or 1.9 mW at 100 MHz or 133 MHz processing rate using a TSMC 0.18 μm or 0.13 μm CMOS process. Compared with previous techniques, this design achieves higher performance, more functions, more flexibility and higher compatibility than other micro controller designs.
MEDIPIX: a VLSI chip for a GaAs pixel detector for digital radiology

NASA Astrophysics Data System (ADS)

Amendolia, S. R.; Bertolucci, E.; Bisogni, M. G.; Bottigli, U.; Ceccopieri, A.; Ciocci, M. A.; Conti, M.; Delogu, P.; Fantacci, M. E.; Maestro, P.; Marzulli, V.; Pernigotti, E.; Romeo, N.; Rosso, V.; Rosso, P.; Stefanini, A.; Stumbo, S.

1999-02-01

A GaAs pixel detector designed for digital mammography, equipped with a 36-channel single photon counting discrete read-out electronics, was tested using a test object developed for quality control purposes in mammography. Each pixel was 200×200 μm 2 large, and 200 μm deep. The choice of GaAs with respect to silicon (largely used in other applications and with a more established technique) has been made because of the much better detection efficiency at mammographic energies, combined with a very good charge collection efficiency achieved thanks to new ohmic contacts. This GaAs detector is able to perform a measurement of low-contrast details, with minimum contrast lower (nearly a factor two) than that typically achievable with standard mammographic film+screen systems in the same conditions of clinical routine. This should allow for an earlier diagnosis of breast tumour masses. Due to these encouraging results, the next step in the evolution of our imaging system based on GaAs detectors has been the development of a VLSI front-end prototype chip (MEDIPIX ) in order to cover a much larger diagnostic area. The chip reads 64×64 channels in single photon counting mode, each one 170 μm wide. Each channel contains also a test input where a signal can be simulated, injecting a known charge through a 16 f F capacitor. Fake signals have been injected via the test input measuring and equalizing minimum thresholds for all the channels. On an average, in most of the performing chips available up to now, we have found that it is possible to set a threshold as low as 1800 electrons with an RMS of 150 electrons (10 standard deviations lower than the 20 keV photon signal roughly equivalent to 4500 electrons). The detector, bump-bonded to the chip, will be tested and a ladder of detectors will be prepared to be able to scan large surface objects.
Off-line, built-in test techniques for VLSI circuits

NASA Technical Reports Server (NTRS)

Buehler, M. G.; Sievers, M. W.

1982-01-01

It is shown that the use of redundant on-chip circuitry improves the testability of an entire VLSI circuit. In the study described here, five techniques applied to a two-bit ripple carry adder are compared. The techniques considered are self-oscillation, self-comparison, partition, scan path, and built-in logic block observer. It is noted that both classical stuck-at faults and nonclassical faults, such as bridging faults (shorts), stuck-on x faults where x may be 0, 1, or vary between the two, and parasitic flip-flop faults occur in IC structures. To simplify the analysis of the testing techniques, however, a stuck-at fault model is assumed.
Modulation and coding for satellite and space communications

NASA Technical Reports Server (NTRS)

Yuen, Joseph H.; Simon, Marvin K.; Pollara, Fabrizio; Divsalar, Dariush; Miller, Warner H.; Morakis, James C.; Ryan, Carl R.

1990-01-01

Several modulation and coding advances supported by NASA are summarized. To support long-constraint-length convolutional code, a VLSI maximum-likelihood decoder, utilizing parallel processing techniques, which is being developed to decode convolutional codes of constraint length 15 and a code rate as low as 1/6 is discussed. A VLSI high-speed 8-b Reed-Solomon decoder which is being developed for advanced tracking and data relay satellite (ATDRS) applications is discussed. A 300-Mb/s modem with continuous phase modulation (CPM) and codings which is being developed for ATDRS is discussed. Trellis-coded modulation (TCM) techniques are discussed for satellite-based mobile communication applications.
A VLSI pipeline design of a fast prime factor DFT on a finite field

NASA Technical Reports Server (NTRS)

Truong, T. K.; Hsu, I. S.; Shao, H. M.; Reed, I. S.; Shyu, H. C.

1986-01-01

A conventional prime factor discrete Fourier transform (DFT) algorithm is used to realize a discrete Fourier-like transform on the finite field, GF(q sub n). A pipeline structure is used to implement this prime factor DFT over GF(q sub n). This algorithm is developed to compute cyclic convolutions of complex numbers and to decode Reed-Solomon codes. Such a pipeline fast prime factor DFT algorithm over GF(q sub n) is regular, simple, expandable, and naturally suitable for VLSI implementation. An example illustrating the pipeline aspect of a 30-point transform over GF(q sub n) is presented.
UW VLSI chip tester

NASA Astrophysics Data System (ADS)

McKenzie, Neil

1989-12-01

We present a design for a low-cost, functional VLSI chip tester. It is based on the Apple MacIntosh II personal computer. It tests chips that have up to 128 pins. All pin drivers of the tester are bidirectional; each pin is programmed independently as an input or an output. The tester can test both static and dynamic chips. Rudimentary speed testing is provided. Chips are tested by executing C programs written by the user. A software library is provided for program development. Tests run under both the Mac Operating System and A/UX. The design is implemented using Xilinx Logic Cell Arrays. Price/performance tradeoffs are discussed.
Parallel algorithms for placement and routing in VLSI design. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Brouwer, Randall Jay

1991-01-01

The computational requirements for high quality synthesis, analysis, and verification of very large scale integration (VLSI) designs have rapidly increased with the fast growing complexity of these designs. Research in the past has focused on the development of heuristic algorithms, special purpose hardware accelerators, or parallel algorithms for the numerous design tasks to decrease the time required for solution. Two new parallel algorithms are proposed for two VLSI synthesis tasks, standard cell placement and global routing. The first algorithm, a parallel algorithm for global routing, uses hierarchical techniques to decompose the routing problem into independent routing subproblems that are solved in parallel. Results are then presented which compare the routing quality to the results of other published global routers and which evaluate the speedups attained. The second algorithm, a parallel algorithm for cell placement and global routing, hierarchically integrates a quadrisection placement algorithm, a bisection placement algorithm, and the previous global routing algorithm. Unique partitioning techniques are used to decompose the various stages of the algorithm into independent tasks which can be evaluated in parallel. Finally, results are presented which evaluate the various algorithm alternatives and compare the algorithm performance to other placement programs. Measurements are presented on the parallel speedups available.
An engineering methodology for implementing and testing VLSI (Very Large Scale Integrated) circuits

NASA Astrophysics Data System (ADS)

Corliss, Walter F., II

1989-03-01

The engineering methodology for producing a fully tested VLSI chip from a design layout is presented. A 16-bit correlator, NPS CORN88, that was previously designed, was used as a vehicle to demonstrate this methodology. The study of the design and simulation tools, MAGIC and MOSSIM II, was the focus of the design and validation process. The design was then implemented and the chip was fabricated by MOSIS. This fabricated chip was then used to develop a testing methodology for using the digital test facilities at NPS. NPS CORN88 was the first full custom VLSI chip, designed at NPS, to be tested with the NPS digital analysis system, Tektronix DAS 9100 series tester. The capabilities and limitations of these test facilities are examined. NPS CORN88 test results are included to demonstrate the capabilities of the digital test system. A translator, MOS2DAS, was developed to convert the MOSSIM II simulation program to the input files required by the DAS 9100 device verification software, 91DVS. Finally, a tutorial for using the digital test facilities, including the DAS 9100 and associated support equipments, is included as an appendix.
A technique for evaluating the application of the pin-level stuck-at fault model to VLSI circuits

NASA Technical Reports Server (NTRS)

Palumbo, Daniel L.; Finelli, George B.

1987-01-01

Accurate fault models are required to conduct the experiments defined in validation methodologies for highly reliable fault-tolerant computers (e.g., computers with a probability of failure of 10 to the -9 for a 10-hour mission). Described is a technique by which a researcher can evaluate the capability of the pin-level stuck-at fault model to simulate true error behavior symptoms in very large scale integrated (VLSI) digital circuits. The technique is based on a statistical comparison of the error behavior resulting from faults applied at the pin-level of and internal to a VLSI circuit. As an example of an application of the technique, the error behavior of a microprocessor simulation subjected to internal stuck-at faults is compared with the error behavior which results from pin-level stuck-at faults. The error behavior is characterized by the time between errors and the duration of errors. Based on this example data, the pin-level stuck-at fault model is found to deliver less than ideal performance. However, with respect to the class of faults which cause a system crash, the pin-level, stuck-at fault model is found to provide a good modeling capability.
A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses

PubMed Central

Qiao, Ning; Mostafa, Hesham; Corradi, Federico; Osswald, Marc; Stefanini, Fabio; Sumislawska, Dora; Indiveri, Giacomo

2015-01-01

Implementing compact, low-power artificial neural processing systems with real-time on-line learning abilities is still an open challenge. In this paper we present a full-custom mixed-signal VLSI device with neuromorphic learning circuits that emulate the biophysics of real spiking neurons and dynamic synapses for exploring the properties of computational neuroscience models and for building brain-inspired computing systems. The proposed architecture allows the on-chip configuration of a wide range of network connectivities, including recurrent and deep networks, with short-term and long-term plasticity. The device comprises 128 K analog synapse and 256 neuron circuits with biologically plausible dynamics and bi-stable spike-based plasticity mechanisms that endow it with on-line learning abilities. In addition to the analog circuits, the device comprises also asynchronous digital logic circuits for setting different synapse and neuron properties as well as different network configurations. This prototype device, fabricated using a 180 nm 1P6M CMOS process, occupies an area of 51.4 mm2, and consumes approximately 4 mW for typical experiments, for example involving attractor networks. Here we describe the details of the overall architecture and of the individual circuits and present experimental results that showcase its potential. By supporting a wide range of cortical-like computational modules comprising plasticity mechanisms, this device will enable the realization of intelligent autonomous systems with on-line learning capabilities. PMID:25972778
A robust and scalable neuromorphic communication system by combining synaptic time multiplexing and MIMO-OFDM.

PubMed

Srinivasa, Narayan; Zhang, Deying; Grigorian, Beayna

2014-03-01

This paper describes a novel architecture for enabling robust and efficient neuromorphic communication. The architecture combines two concepts: 1) synaptic time multiplexing (STM) that trades space for speed of processing to create an intragroup communication approach that is firing rate independent and offers more flexibility in connectivity than cross-bar architectures and 2) a wired multiple input multiple output (MIMO) communication with orthogonal frequency division multiplexing (OFDM) techniques to enable a robust and efficient intergroup communication for neuromorphic systems. The MIMO-OFDM concept for the proposed architecture was analyzed by simulating large-scale spiking neural network architecture. Analysis shows that the neuromorphic system with MIMO-OFDM exhibits robust and efficient communication while operating in real time with a high bit rate. Through combining STM with MIMO-OFDM techniques, the resulting system offers a flexible and scalable connectivity as well as a power and area efficient solution for the implementation of very large-scale spiking neural architectures in hardware.
GNC Architecture Design for ARES Simulation. Revision 3.0. Revision 3.0

NASA Technical Reports Server (NTRS)

Gay, Robert

2006-01-01

The purpose of this document is to describe the GNC architecture and associated interfaces for all ARES simulations. Establishing a common architecture facilitates development across the ARES simulations and provides an efficient mechanism for creating an end-to-end simulation capability. In general, the GNC architecture is the frame work in which all GNC development takes place, including sensor and effector models. All GNC software applications have a standard location within the architecture making integration easier and, thus more efficient.
GLOBECOM '87 - Global Telecommunications Conference, Tokyo, Japan, Nov. 15-18, 1987, Conference Record. Volumes 1, 2, & 3

NASA Astrophysics Data System (ADS)

The present conference on global telecommunications discusses topics in the fields of Integrated Services Digital Network (ISDN) technology field trial planning and results to date, motion video coding, ISDN networking, future network communications security, flexible and intelligent voice/data networks, Asian and Pacific lightwave and radio systems, subscriber radio systems, the performance of distributed systems, signal processing theory, satellite communications modulation and coding, and terminals for the handicapped. Also discussed are knowledge-based technologies for communications systems, future satellite transmissions, high quality image services, novel digital signal processors, broadband network access interface, traffic engineering for ISDN design and planning, telecommunications software, coherent optical communications, multimedia terminal systems, advanced speed coding, portable and mobile radio communications, multi-Gbit/second lightwave transmission systems, enhanced capability digital terminals, communications network reliability, advanced antimultipath fading techniques, undersea lightwave transmission, image coding, modulation and synchronization, adaptive signal processing, integrated optical devices, VLSI technologies for ISDN, field performance of packet switching, CSMA protocols, optical transport system architectures for broadband ISDN, mobile satellite communications, indoor wireless communication, echo cancellation in communications, and distributed network algorithms.
A Review of Current Neuromorphic Approaches for Vision, Auditory, and Olfactory Sensors

PubMed Central

Vanarse, Anup; Osseiran, Adam; Rassau, Alexander

2016-01-01

Conventional vision, auditory, and olfactory sensors generate large volumes of redundant data and as a result tend to consume excessive power. To address these shortcomings, neuromorphic sensors have been developed. These sensors mimic the neuro-biological architecture of sensory organs using aVLSI (analog Very Large Scale Integration) and generate asynchronous spiking output that represents sensing information in ways that are similar to neural signals. This allows for much lower power consumption due to an ability to extract useful sensory information from sparse captured data. The foundation for research in neuromorphic sensors was laid more than two decades ago, but recent developments in understanding of biological sensing and advanced electronics, have stimulated research on sophisticated neuromorphic sensors that provide numerous advantages over conventional sensors. In this paper, we review the current state-of-the-art in neuromorphic implementation of vision, auditory, and olfactory sensors and identify key contributions across these fields. Bringing together these key contributions we suggest a future research direction for further development of the neuromorphic sensing field. PMID:27065784
VLSI Implementation of Fault Tolerance Multiplier based on Reversible Logic Gate

NASA Astrophysics Data System (ADS)

Ahmad, Nabihah; Hakimi Mokhtar, Ahmad; Othman, Nurmiza binti; Fhong Soon, Chin; Rahman, Ab Al Hadi Ab

2017-08-01

Multiplier is one of the essential component in the digital world such as in digital signal processing, microprocessor, quantum computing and widely used in arithmetic unit. Due to the complexity of the multiplier, tendency of errors are very high. This paper aimed to design a 2×2 bit Fault Tolerance Multiplier based on Reversible logic gate with low power consumption and high performance. This design have been implemented using 90nm Complemetary Metal Oxide Semiconductor (CMOS) technology in Synopsys Electronic Design Automation (EDA) Tools. Implementation of the multiplier architecture is by using the reversible logic gates. The fault tolerance multiplier used the combination of three reversible logic gate which are Double Feynman gate (F2G), New Fault Tolerance (NFT) gate and Islam Gate (IG) with the area of 160μm x 420.3μm (67.25 mm2). This design achieved a low power consumption of 122.85μW and propagation delay of 16.99ns. The fault tolerance multiplier proposed achieved a low power consumption and high performance which suitable for application of modern computing as it has a fault tolerance capabilities.
Modelling short channel mosfets for use in VLSI

NASA Technical Reports Server (NTRS)

Klafter, Alex; Pilorz, Stuart; Polosa, Rosa Loguercio; Ruddock, Guy; Smith, Andrew

1986-01-01

In an investigation of metal oxide semiconductor field effect transistor (MOFSET) devices, a one-dimensional mathematical model of device dynamics was prepared, from which an accurate and computationally efficient drain current expression could be derived for subsequent parameter extraction. While a critical review revealed weaknesses in existing 1-D models (Pao-Sah, Pierret-Shields, Brews, and Van de Wiele), this new model in contrast was found to allow all the charge distributions to be continuous, to retain the inversion layer structure, and to include the contribution of current from the pinched-off part of the device. The model allows the source and drain to operate in different regimes. Numerical algorithms used for the evaluation of surface potentials in the various models are presented.

Selective attention in multi-chip address-event systems.

PubMed

Bartolozzi, Chiara; Indiveri, Giacomo

2009-01-01

Selective attention is the strategy used by biological systems to cope with the inherent limits in their available computational resources, in order to efficiently process sensory information. The same strategy can be used in artificial systems that have to process vast amounts of sensory data with limited resources. In this paper we present a neuromorphic VLSI device, the "Selective Attention Chip" (SAC), which can be used to implement these models in multi-chip address-event systems. We also describe a real-time sensory-motor system, which integrates the SAC with a dynamic vision sensor and a robotic actuator. We present experimental results from each component in the system, and demonstrate how the complete system implements a real-time stimulus-driven selective attention model.
Computationally efficient modeling and simulation of large scale systems

NASA Technical Reports Server (NTRS)

Jain, Jitesh (Inventor); Cauley, Stephen F. (Inventor); Li, Hong (Inventor); Koh, Cheng-Kok (Inventor); Balakrishnan, Venkataramanan (Inventor)

2010-01-01

A method of simulating operation of a VLSI interconnect structure having capacitive and inductive coupling between nodes thereof. A matrix X and a matrix Y containing different combinations of passive circuit element values for the interconnect structure are obtained where the element values for each matrix include inductance L and inverse capacitance P. An adjacency matrix A associated with the interconnect structure is obtained. Numerical integration is used to solve first and second equations, each including as a factor the product of the inverse matrix X.sup.1 and at least one other matrix, with first equation including X.sup.1Y, X.sup.1A, and X.sup.1P, and the second equation including X.sup.1A and X.sup.1P.
Si photonics technology for future optical interconnection

NASA Astrophysics Data System (ADS)

Zheng, Xuezhe; Krishnamoorthy, Ashok V.

2011-12-01

Scaling of computing systems require ultra-efficient interconnects with large bandwidth density. Silicon photonics offers a disruptive solution with advantages in reach, energy efficiency and bandwidth density. We review our progress in developing building blocks for ultra-efficient WDM silicon photonic links. Employing microsolder based hybrid integration with low parasitics and high density, we optimize photonic devices on SOI platforms and VLSI circuits on more advanced bulk CMOS technology nodes independently. Progressively, we successfully demonstrated single channel hybrid silicon photonic transceivers at 5 Gbps and 10 Gbps, and 80 Gbps arrayed WDM silicon photonic transceiver using reverse biased depletion ring modulators and Ge waveguide photo detectors. Record-high energy efficiency of less than 100fJ/bit and 385 fJ/bit were achieved for the hybrid integrated transmitter and receiver, respectively. Waveguide grating based optical proximity couplers were developed with low loss and large optical bandwidth to enable multi-layer intra/inter-chip optical interconnects. Thermal engineering of WDM devices by selective substrate removal, together with WDM link using synthetic wavelength comb, we significantly improved the device tuning efficiency and reduced the tuning range. Using these innovative techniques, two orders of magnitude tuning power reduction was achieved. And tuning cost of only a few 10s of fJ/bit is expected for high data rate WDM silicon photonic links.
A Digital Liquid State Machine With Biologically Inspired Learning and Its Application to Speech Recognition.

PubMed

Zhang, Yong; Li, Peng; Jin, Yingyezhe; Choe, Yoonsuck

2015-11-01

This paper presents a bioinspired digital liquid-state machine (LSM) for low-power very-large-scale-integration (VLSI)-based machine learning applications. To the best of the authors' knowledge, this is the first work that employs a bioinspired spike-based learning algorithm for the LSM. With the proposed online learning, the LSM extracts information from input patterns on the fly without needing intermediate data storage as required in offline learning methods such as ridge regression. The proposed learning rule is local such that each synaptic weight update is based only upon the firing activities of the corresponding presynaptic and postsynaptic neurons without incurring global communications across the neural network. Compared with the backpropagation-based learning, the locality of computation in the proposed approach lends itself to efficient parallel VLSI implementation. We use subsets of the TI46 speech corpus to benchmark the bioinspired digital LSM. To reduce the complexity of the spiking neural network model without performance degradation for speech recognition, we study the impacts of synaptic models on the fading memory of the reservoir and hence the network performance. Moreover, we examine the tradeoffs between synaptic weight resolution, reservoir size, and recognition performance and present techniques to further reduce the overhead of hardware implementation. Our simulation results show that in terms of isolated word recognition evaluated using the TI46 speech corpus, the proposed digital LSM rivals the state-of-the-art hidden Markov-model-based recognizer Sphinx-4 and outperforms all other reported recognizers including the ones that are based upon the LSM or neural networks.
From Smart-Eco Building to High-Performance Architecture: Optimization of Energy Consumption in Architecture of Developing Countries

NASA Astrophysics Data System (ADS)

Mahdavinejad, M.; Bitaab, N.

2017-08-01

Search for high-performance architecture and dreams of future architecture resulted in attempts towards meeting energy efficient architecture and planning in different aspects. Recent trends as a mean to meet future legacy in architecture are based on the idea of innovative technologies for resource efficient buildings, performative design, bio-inspired technologies etc. while there are meaningful differences between architecture of developed and developing countries. Significance of issue might be understood when the emerging cities are found interested in Dubaization and other related booming development doctrines. This paper is to analyze the level of developing countries’ success to achieve smart-eco buildings’ goals and objectives. Emerging cities of West of Asia are selected as case studies of the paper. The results of the paper show that the concept of high-performance architecture and smart-eco buildings are different in developing countries in comparison with developed countries. The paper is to mention five essential issues in order to improve future architecture of developing countries: 1- Integrated Strategies for Energy Efficiency, 2- Contextual Solutions, 3- Embedded and Initial Energy Assessment, 4- Staff and Occupancy Wellbeing, 5- Life-Cycle Monitoring.
Multi-petascale highly efficient parallel supercomputer

DOEpatents

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

2015-07-14

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Ion implantation enhanced metal-Si-metal photodetectors

NASA Astrophysics Data System (ADS)

Sharma, A. K.; Scott, K. A. M.; Brueck, S. R. J.; Zolper, J. C.; Myers, D. R.

1994-05-01

The quantum efficiency and frequency response of simple Ni-Si-Ni metal-semiconductor-metal (MSM) photodetectors at long wavelengths are significantly enhanced with a simple, ion-implantation step to create a highly absorbing region approx. 1 micron below the Si surface. The internal quantum efficiency is improved by a factor of approx. 3 at 860 nm (to 64%) and a full factor of ten at 1.06 microns (to 23%) as compared with otherwise identical unimplanted devices. Dark currents are only slightly affected by the implantation process and are as low as 630 pA for a 4.5-micron gap device at 10-V bias. Dramatic improvement in the impulse response is observed, 100 ps vs. 600 ps, also at 10-V bias and 4.5-micron gap, due to the elimination of carrier diffusion tails in the implanted devices. Due to its planar structure, this device is fully VLSI compatible. Potential applications include optical interconnections for local area networks and multi-chip modules.
Leaky Integrate and Fire Neuron by Charge-Discharge Dynamics in Floating-Body MOSFET.

PubMed

Dutta, Sangya; Kumar, Vinay; Shukla, Aditya; Mohapatra, Nihar R; Ganguly, Udayan

2017-08-15

Neuro-biology inspired Spiking Neural Network (SNN) enables efficient learning and recognition tasks. To achieve a large scale network akin to biology, a power and area efficient electronic neuron is essential. Earlier, we had demonstrated an LIF neuron by a novel 4-terminal impact ionization based n+/p/n+ with an extended gate (gated-INPN) device by physics simulation. Excellent improvement in area and power compared to conventional analog circuit implementations was observed. In this paper, we propose and experimentally demonstrate a compact conventional 3-terminal partially depleted (PD) SOI- MOSFET (100 nm gate length) to replace the 4-terminal gated-INPN device. Impact ionization (II) induced floating body effect in SOI-MOSFET is used to capture LIF neuron behavior to demonstrate spiking frequency dependence on input. MHz operation enables attractive hardware acceleration compared to biology. Overall, conventional PD-SOI-CMOS technology enables very-large-scale-integration (VLSI) which is essential for biology scale (~10 11 neuron based) large neural networks.
Adaptive neuro fuzzy inference system-based power estimation method for CMOS VLSI circuits

NASA Astrophysics Data System (ADS)

Vellingiri, Govindaraj; Jayabalan, Ramesh

2018-03-01

Recent advancements in very large scale integration (VLSI) technologies have made it feasible to integrate millions of transistors on a single chip. This greatly increases the circuit complexity and hence there is a growing need for less-tedious and low-cost power estimation techniques. The proposed work employs Back-Propagation Neural Network (BPNN) and Adaptive Neuro Fuzzy Inference System (ANFIS), which are capable of estimating the power precisely for the complementary metal oxide semiconductor (CMOS) VLSI circuits, without requiring any knowledge on circuit structure and interconnections. The ANFIS to power estimation application is relatively new. Power estimation using ANFIS is carried out by creating initial FIS modes using hybrid optimisation and back-propagation (BP) techniques employing constant and linear methods. It is inferred that ANFIS with the hybrid optimisation technique employing the linear method produces better results in terms of testing error that varies from 0% to 0.86% when compared to BPNN as it takes the initial fuzzy model and tunes it by means of a hybrid technique combining gradient descent BP and mean least-squares optimisation algorithms. ANFIS is the best suited for power estimation application with a low RMSE of 0.0002075 and a high coefficient of determination (R) of 0.99961.
Laser Microchemistry : A Powerful Tool For VLSI

NASA Astrophysics Data System (ADS)

Tonneau, Didier; Guern, Yves; Pelous, Gerard

1989-01-01

Interconnection direct writing on ICs is possible by localized laser-assisted Chemical Vapor Deposition. Recently we have developed and marketed a new laser microchemistry tool particularly designed for VLSI prototypes rewiring. By dissociating Ni(CO)4 molecules, Ni lines can be written at speeds higher than 5 gm/s under laser induced temperature as low as 400°C. At the same temperature tungsten stripes can be driven from decomposition of WF6-H2 mixtures. However the tungsten deposition rate is about two orders of magnitude lower than the nickel growth rate in the same temperature conditions. The resistivities of the deposits are in both cases around 10 μΩ.cm. Silicon dioxide layers can be promoted from dissociation of a Si2H6-N20 mixture under surface temperature around 500°C. These metal and insulator deposition basic steps have been integrated in a complete metal bridging process suitable for the last interconnection level of a VLSI circuit. This process has been firstly estimated from a functional point of view, by electrical characterizations realized on test patterns entirely drawn by laser chemistry. At least, by measuring the time necessary to perform a metal bridge, the process has been evaluated from an economical point of view.
A neural network device for on-line particle identification in cosmic ray experiments

NASA Astrophysics Data System (ADS)

Scrimaglio, R.; Finetti, N.; D'Altorio, L.; Rantucci, E.; Raso, M.; Segreto, E.; Tassoni, A.; Cardarilli, G. C.

2004-05-01

On-line particle identification is one of the main goals of many experiments in space both for rare event studies and for optimizing measurements along the orbital trajectory. Neural networks can be a useful tool for signal processing and real time data analysis in such experiments. In this document we report on the performances of a programmable neural device which was developed in VLSI analog/digital technology. Neurons and synapses were accomplished by making use of Operational Transconductance Amplifier (OTA) structures. In this paper we report on the results of measurements performed in order to verify the agreement of the characteristic curves of each elementary cell with simulations and on the device performances obtained by implementing simple neural structures on the VLSI chip. A feed-forward neural network (Multi-Layer Perceptron, MLP) was implemented on the VLSI chip and trained to identify particles by processing the signals of two-dimensional position-sensitive Si detectors. The radiation monitoring device consisted of three double-sided silicon strip detectors. From the analysis of a set of simulated data it was found that the MLP implemented on the neural device gave results comparable with those obtained with the standard method of analysis confirming that the implemented neural network could be employed for real time particle identification.
Overlay Tolerances For VLSI Using Wafer Steppers

NASA Astrophysics Data System (ADS)

Levinson, Harry J.; Rice, Rory

1988-01-01

In order for VLSI circuits to function properly, the masking layers used in the fabrication of those devices must overlay each other to within the manufacturing tolerance incorporated in the circuit design. The capabilities of the alignment tools used in the masking process determine the overlay tolerances to which circuits can be designed. It is therefore of considerable importance that these capabilities be well characterized. Underestimation of the overlay accuracy results in unnecessarily large devices, resulting in poor utilization of wafer area and possible degradation of device performance. Overestimation will result in significant yield loss because of the failure to conform to the tolerances of the design rules. The proper methodology for determining the overlay capabilities of wafer steppers, the most commonly used alignment tool for the production of VLSI circuits, is the subject of this paper. Because cost-effective manufacturing process technology has been the driving force of VLSI, the impact on productivity is a primary consideration in all discussions. Manufacturers of alignment tools advertise the capabilities of their equipment. It is notable that no manufacturer currently characterizes his aligners in a manner consistent with the requirements of producing very large integrated circuits, as will be discussed. This has resulted in the situation in which the evaluation and comparison of the capabilities of alignment tools require the attention of a lithography specialist. Unfortunately, lithographic capabilities must be known by many other people, particularly the circuit designers and the managers responsible for the financial consequences of the high prices of modern alignment tools. All too frequently, the designer or manager is confronted with contradictory data, one set coming from his lithography specialist, and the other coming from a sales representative of an equipment manufacturer. Since the latter generally attempts to make his merchandise appear as attractive as possible, the lithographer is frequently placed in the position of having to explain subtle issues in order to justify his decisions. It is the purpose of this paper to provide that explanation.
A new variable-resolution associative memory for high energy physics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Annovi, A.; Amerio, S.; Beretta, M.

2011-07-01

We describe an important advancement for the Associative Memory device (AM). The AM is a VLSI processor for pattern recognition based on Content Addressable Memory (CAM) architecture. The AM is optimized for on-line track finding in high-energy physics experiments. Pattern matching is carried out by finding track candidates in coarse resolution 'roads'. A large AM bank stores all trajectories of interest, called 'patterns', for a given detector resolution. The AM extracts roads compatible with a given event during detector read-out. Two important variables characterize the quality of the AM bank: its 'coverage' and the level of fake roads. The coverage,more » which describes the geometric efficiency of a bank, is defined as the fraction of tracks that match at least one pattern in the bank. Given a certain road size, the coverage of the bank can be increased just adding patterns to the bank, while the number of fakes unfortunately is roughly proportional to the number of patterns in the bank. Moreover, as the luminosity increases, the fake rate increases rapidly because of the increased silicon occupancy. To counter that, we must reduce the width of our roads. If we decrease the road width using the current technology, the system will become very large and extremely expensive. We propose an elegant solution to this problem: the 'variable resolution patterns'. Each pattern and each detector layer within a pattern will be able to use the optimal width, but we will use a 'don't care' feature (inspired from ternary CAMs) to increase the width when that is more appropriate. In other words we can use patterns of variable shape. As a result we reduce the number of fake roads, while keeping the efficiency high and avoiding excessive bank size due to the reduced width. We describe the idea, the implementation in the new AM design and the implementation of the algorithm in the simulation. Finally we show the effectiveness of the 'variable resolution patterns' idea using simulated high occupancy events in the ATLAS detector. (authors)« less
Collective behavior of networks with linear (VLSI) integrate-and-fire neurons.

PubMed

Fusi, S; Mattia, M

1999-04-01

We analyze in detail the statistical properties of the spike emission process of a canonical integrate-and-fire neuron, with a linear integrator and a lower bound for the depolarization, as often used in VLSI implementations (Mead, 1989). The spike statistics of such neurons appear to be qualitatively similar to conventional (exponential) integrate-and-fire neurons, which exhibit a wide variety of characteristics observed in cortical recordings. We also show that, contrary to current opinion, the dynamics of a network composed of such neurons has two stable fixed points, even in the purely excitatory network, corresponding to two different states of reverberating activity. The analytical results are compared with numerical simulations and are found to be in good agreement.
VLSI technology for smaller, cheaper, faster return link systems

NASA Technical Reports Server (NTRS)

Nanzetta, Kathy; Ghuman, Parminder; Bennett, Toby; Solomon, Jeff; Dowling, Jason; Welling, John

1994-01-01

Very Large Scale Integration (VLSI) Application-specific Integrated Circuit (ASIC) technology has enabled substantially smaller, cheaper, and more capable telemetry data systems. However, the rapid growth in available ASIC fabrication densities has far outpaced the application of this technology to telemetry systems. Available densities have grown by well over an order magnitude since NASA's Goddard Space Flight Center (GSFC) first began developing ASIC's for ground telemetry systems in 1985. To take advantage of these higher integration levels, a new generation of ASIC's for return link telemetry processing is under development. These new submicron devices are designed to further reduce the cost and size of NASA return link processing systems while improving performance. This paper describes these highly integrated processing components.
10 K gate I(2)L and 1 K component analog compatible bipolar VLSI technology - HIT-2

NASA Astrophysics Data System (ADS)

Washio, K.; Watanabe, T.; Okabe, T.; Horie, N.

1985-02-01

An advanced analog/digital bipolar VLSI technology that combines on the same chip 2-ns 10 K I(2)L gates with 1 K analog devices is proposed. The new technology, called high-density integration technology-2, is based on a new structure concept that consists of three major techniques: shallow grooved-isolation, I(2)L active layer etching, and I(2)L current gain increase. I(2)L circuits with 80-MHz maximum toggle frequency have developed compatibly with n-p-n transistors having a BV(CE0) of more than 10 V and an f(T) of 5 GHz, and lateral p-n-p transistors having an f(T) of 150 MHz.
An Integrated Unix-based CAD System for the Design and Testing of Custom VLSI Chips

NASA Technical Reports Server (NTRS)

Deutsch, L. J.

1985-01-01

A computer aided design (CAD) system that is being used at the Jet Propulsion Laboratory for the design of custom and semicustom very large scale integrated (VLSI) chips is described. The system consists of a Digital Equipment Corporation VAX computer with the UNIX operating system and a collection of software tools for the layout, simulation, and verification of microcircuits. Most of these tools were written by the academic community and are, therefore, available to JPL at little or no cost. Some small pieces of software have been written in-house in order to make all the tools interact with each other with a minimal amount of effort on the part of the designer.
Concepts for on-board satellite image registration. Volume 3: Impact of VLSI/VHSIC on satellite on-board signal processing

NASA Technical Reports Server (NTRS)

Aanstoos, J. V.; Snyder, W. E.

1981-01-01

Anticipated major advances in integrated circuit technology in the near future are described as well as their impact on satellite onboard signal processing systems. Dramatic improvements in chip density, speed, power consumption, and system reliability are expected from very large scale integration. Improvements are expected from very large scale integration enable more intelligence to be placed on remote sensing platforms in space, meeting the goals of NASA's information adaptive system concept, a major component of the NASA End-to-End Data System program. A forecast of VLSI technological advances is presented, including a description of the Defense Department's very high speed integrated circuit program, a seven-year research and development effort.
Electronic shift register memory based on molecular electron-transfer reactions

NASA Technical Reports Server (NTRS)

Hopfield, J. J.; Onuchic, Jose Nelson; Beratan, David N.

1989-01-01

The design of a shift register memory at the molecular level is described in detail. The memory elements are based on a chain of electron-transfer molecules incorporated on a very large scale integrated (VLSI) substrate, and the information is shifted by photoinduced electron-transfer reactions. The design requirements for such a system are discussed, and several realistic strategies for synthesizing these systems are presented. The immediate advantage of such a hybrid molecular/VLSI device would arise from the possible information storage density. The prospect of considerable savings of energy per bit processed also exists. This molecular shift register memory element design solves the conceptual problems associated with integrating molecular size components with larger (micron) size features on a chip.
ACE: Automatic Centroid Extractor for real time target tracking

NASA Technical Reports Server (NTRS)

Cameron, K.; Whitaker, S.; Canaris, J.

1990-01-01

A high performance video image processor has been implemented which is capable of grouping contiguous pixels from a raster scan image into groups and then calculating centroid information for each object in a frame. The algorithm employed to group pixels is very efficient and is guaranteed to work properly for all convex shapes as well as most concave shapes. Processing speeds are adequate for real time processing of video images having a pixel rate of up to 20 million pixels per second. Pixels may be up to 8 bits wide. The processor is designed to interface directly to a transputer serial link communications channel with no additional hardware. The full custom VLSI processor was implemented in a 1.6 mu m CMOS process and measures 7200 mu m on a side.

State-of-the-art Architectures and Technologies of High-Efficiency Solar Cells Based on III-V Heterostructures for Space and Terrestrial Applications

NASA Astrophysics Data System (ADS)

Pakhanov, N. A.; Andreev, V. M.; Shvarts, M. Z.; Pchelyakov, O. P.

2018-03-01

Multi-junction solar cells based on III-V compounds are the most efficient converters of solar energy to electricity and are widely used in space solar arrays and terrestrial photovoltaic modules with sunlight concentrators. All modern high-efficiency III-V solar cells are based on the long-developed triple-junction III-V GaInP/GaInAs/Ge heterostructure and have an almost limiting efficiency for a given architecture — 30 and 41.6% for space and terrestrial concentrated radiations, respectively. Currently, an increase in efficiency is achieved by converting from the 3-junction to the more efficient 4-, 5-, and even 6-junction III-V architectures: growth technologies and methods of post-growth treatment of structures have been developed, new materials with optimal bandgaps have been designed, and crystallographic parameters have been improved. In this review, we consider recent achievements and prospects for the main directions of research and improvement of architectures, technologies, and materials used in laboratories to develop solar cells with the best conversion efficiency: 35.8% for space, 38.8% for terrestrial, and 46.1% for concentrated sunlight. It is supposed that by 2020, the efficiency will approach 40% for direct space radiation and 50% for concentrated terrestrial solar radiation. This review considers the architecture and technologies of solar cells with record-breaking efficiency for terrestrial and space applications. It should be noted that in terrestrial power plants, the use of III-V SCs is economically advantageous in systems with sunlight concentrators.
Autonomous, Decentralized Grid Architecture: Prosumer-Based Distributed Autonomous Cyber-Physical Architecture for Ultra-Reliable Green Electricity Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

2012-01-11

GENI Project: Georgia Tech is developing a decentralized, autonomous, internet-like control architecture and control software system for the electric power grid. Georgia Tech’s new architecture is based on the emerging concept of electricity prosumers—economically motivated actors that can produce, consume, or store electricity. Under Georgia Tech’s architecture, all of the actors in an energy system are empowered to offer associated energy services based on their capabilities. The actors achieve their sustainability, efficiency, reliability, and economic objectives, while contributing to system-wide reliability and efficiency goals. This is in marked contrast to the current one-way, centralized control paradigm.
Ka-Band Wide-Bandgap Solid-State Power Amplifier: Hardware Validation

NASA Technical Reports Server (NTRS)

Epp, L.; Khan, P.; Silva, A.

2005-01-01

Motivated by recent advances in wide-bandgap (WBG) gallium nitride (GaN) semiconductor technology, there is considerable interest in developing efficient solid-state power amplifiers (SSPAs) as an alternative to the traveling-wave tube amplifier (TWTA) for space applications. This article documents proof-of-concept hardware used to validate power-combining technologies that may enable a 120-W, 40 percent power-added efficiency (PAE) SSPA. Results in previous articles [1-3] indicate that architectures based on at least three power combiner designs are likely to enable the target SSPA. Previous architecture performance analyses and estimates indicate that the proposed architectures can power combine 16 to 32 individual monolithic microwave integrated circuits (MMICs) with >80 percent combining efficiency. This combining efficiency would correspond to MMIC requirements of 5- to 10-W output power and >48 percent PAE. In order to validate the performance estimates of the three proposed architectures, measurements of proof-of-concept hardware are reported here.
Assimilation of Biophysical Neuronal Dynamics in Neuromorphic VLSI.

PubMed

Wang, Jun; Breen, Daniel; Akinin, Abraham; Broccard, Frederic; Abarbanel, Henry D I; Cauwenberghs, Gert

2017-12-01

Representing the biophysics of neuronal dynamics and behavior offers a principled analysis-by-synthesis approach toward understanding mechanisms of nervous system functions. We report on a set of procedures assimilating and emulating neurobiological data on a neuromorphic very large scale integrated (VLSI) circuit. The analog VLSI chip, NeuroDyn, features 384 digitally programmable parameters specifying for 4 generalized Hodgkin-Huxley neurons coupled through 12 conductance-based chemical synapses. The parameters also describe reversal potentials, maximal conductances, and spline regressed kinetic functions for ion channel gating variables. In one set of experiments, we assimilated membrane potential recorded from one of the neurons on the chip to the model structure upon which NeuroDyn was designed using the known current input sequence. We arrived at the programmed parameters except for model errors due to analog imperfections in the chip fabrication. In a related set of experiments, we replicated songbird individual neuron dynamics on NeuroDyn by estimating and configuring parameters extracted using data assimilation from intracellular neural recordings. Faithful emulation of detailed biophysical neural dynamics will enable the use of NeuroDyn as a tool to probe electrical and molecular properties of functional neural circuits. Neuroscience applications include studying the relationship between molecular properties of neurons and the emergence of different spike patterns or different brain behaviors. Clinical applications include studying and predicting effects of neuromodulators or neurodegenerative diseases on ion channel kinetics.
Built-in self-repair of VLSI memories employing neural nets

NASA Astrophysics Data System (ADS)

Mazumder, Pinaki

1998-10-01

The decades of the Eighties and the Nineties have witnessed the spectacular growth of VLSI technology, when the chip size has increased from a few hundred devices to a staggering multi-millon transistors. This trend is expected to continue as the CMOS feature size progresses towards the nanometric dimension of 100 nm and less. SIA roadmap projects that, where as the DRAM chips will integrate over 20 billion devices in the next millennium, the future microprocessors may incorporate over 100 million transistors on a single chip. As the VLSI chip size increase, the limited accessibility of circuit components poses great difficulty for external diagnosis and replacement in the presence of faulty components. For this reason, extensive work has been done in built-in self-test techniques, but little research is known concerning built-in self-repair. Moreover, the extra hardware introduced by conventional fault-tolerance techniques is also likely to become faulty, therefore causing the circuit to be useless. This research demonstrates the feasibility of implementing electronic neural networks as intelligent hardware for memory array repair. Most importantly, we show that the neural network control possesses a robust and degradable computing capability under various fault conditions. Overall, a yield analysis performed on 64K DRAM's shows that the yield can be improved from as low as 20 percent to near 99 percent due to the self-repair design, with overhead no more than 7 percent.
Least Reliable Bits Coding (LRBC) for high data rate satellite communications

NASA Technical Reports Server (NTRS)

Vanderaar, Mark; Wagner, Paul; Budinger, James

1992-01-01

An analysis and discussion of a bandwidth efficient multi-level/multi-stage block coded modulation technique called Least Reliable Bits Coding (LRBC) is presented. LRBC uses simple multi-level component codes that provide increased error protection on increasingly unreliable modulated bits in order to maintain an overall high code rate that increases spectral efficiency. Further, soft-decision multi-stage decoding is used to make decisions on unprotected bits through corrections made on more protected bits. Using analytical expressions and tight performance bounds it is shown that LRBC can achieve increased spectral efficiency and maintain equivalent or better power efficiency compared to that of Binary Phase Shift Keying (BPSK). Bit error rates (BER) vs. channel bit energy with Additive White Gaussian Noise (AWGN) are given for a set of LRB Reed-Solomon (RS) encoded 8PSK modulation formats with an ensemble rate of 8/9. All formats exhibit a spectral efficiency of 2.67 = (log2(8))(8/9) information bps/Hz. Bit by bit coded and uncoded error probabilities with soft-decision information are determined. These are traded with with code rate to determine parameters that achieve good performance. The relative simplicity of Galois field algebra vs. the Viterbi algorithm and the availability of high speed commercial Very Large Scale Integration (VLSI) for block codes indicates that LRBC using block codes is a desirable method for high data rate implementations.
Minimizing energy dissipation of matrix multiplication kernel on Virtex-II

NASA Astrophysics Data System (ADS)

Choi, Seonil; Prasanna, Viktor K.; Jang, Ju-wook

2002-07-01

In this paper, we develop energy-efficient designs for matrix multiplication on FPGAs. To analyze the energy dissipation, we develop a high-level model using domain-specific modeling techniques. In this model, we identify architecture parameters that significantly affect the total energy (system-wide energy) dissipation. Then, we explore design trade-offs by varying these parameters to minimize the system-wide energy. For matrix multiplication, we consider a uniprocessor architecture and a linear array architecture to develop energy-efficient designs. For the uniprocessor architecture, the cache size is a parameter that affects the I/O complexity and the system-wide energy. For the linear array architecture, the amount of storage per processing element is a parameter affecting the system-wide energy. By using maximum amount of storage per processing element and minimum number of multipliers, we obtain a design that minimizes the system-wide energy. We develop several energy-efficient designs for matrix multiplication. For example, for 6×6 matrix multiplication, energy savings of upto 52% for the uniprocessor architecture and 36% for the linear arrary architecture is achieved over an optimized library for Virtex-II FPGA from Xilinx.
Local bipolar-transistor gain measurement for VLSI devices

NASA Astrophysics Data System (ADS)

Bonnaud, O.; Chante, J. P.

1981-08-01

A method is proposed for measuring the gain of a bipolar transistor region as small as possible. The measurement then allows the evaluation particularly of the effect of the emitter-base junction edge and the technology-process influence of VLSI-technology devices. The technique consists in the generation of charge carriers in the transistor base layer by a focused laser beam in order to bias the device in as small a region as possible. To reduce the size of the conducting area, a transversal reverse base current is forced through the base layer resistance in order to pinch in the emitter current in the illuminated region. Transistor gain is deduced from small signal measurements. A model associated with this technique is developed, and this is in agreement with the first experimental results.
The Global Experience of Deployment of Energy-Efficient Technologies in High-Rise Construction

NASA Astrophysics Data System (ADS)

Potienko, Natalia D.; Kuznetsova, Anna A.; Solyakova, Darya N.; Klyueva, Yulia E.

2018-03-01

The objective of this research is to examine issues related to the increasing importance of energy-efficient technologies in high-rise construction. The aim of the paper is to investigate modern approaches to building design that involve implementation of various energy-saving technologies in diverse climates and at different structural levels, including the levels of urban development, functionality, planning, construction and engineering. The research methodology is based on the comprehensive analysis of the advanced global expertise in the design and construction of energy-efficient high-rise buildings, with the examination of their positive and negative features. The research also defines the basic principles of energy-efficient architecture. Besides, it draws parallels between the climate characteristics of countries that lead in the field of energy-efficient high-rise construction, on the one hand, and the climate in Russia, on the other, which makes it possible to use the vast experience of many countries, wholly or partially. The paper also gives an analytical review of the results arrived at by implementing energy efficiency principles into high-rise architecture. The study findings determine the impact of energy-efficient technologies on high-rise architecture and planning solutions. In conclusion, the research states that, apart from aesthetic and compositional interpretation of architectural forms, an architect nowadays has to address the task of finding a synthesis between technological and architectural solutions, which requires knowledge of advanced technologies. The study findings reveal that the implementation of modern energy-efficient technologies into high-rise construction is of immediate interest and is sure to bring long-term benefits.
An energy efficient and high speed architecture for convolution computing based on binary resistive random access memory

NASA Astrophysics Data System (ADS)

Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng

2018-04-01

In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

NASA Astrophysics Data System (ADS)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Study of thickness and uniformity of oxide passivation with DI-O3 on silicon substrate for electronic and photonic applications

NASA Astrophysics Data System (ADS)

Sharma, Mamta; Hazra, Purnima; Singh, Satyendra Kumar

2018-05-01

Since the beginning of semiconductor fabrication technology evolution, clean and passivated substrate surface is one of the prime requirements for fabrication of Electronic and optoelectronic device fabrication. However, as the scale of silicon circuits and device architectures are continuously decreased from micrometer to nanometer (from VLSI to ULSI technology), the cleaning methods to achieve better wafer surface qualities has raised research interests. The development of controlled and uniform silicon dioxide is the most effective and reliable way to achieve better wafer surface quality for fabrication of electronic devices. On the other hand, in order to meet the requirement of high environment safety/regulatory standards, the innovation of cleaning technology is also in demand. The controlled silicon dioxide layer formed by oxidant de-ionized ozonated water has better uniformity. As the uniformity of the controlled silicon dioxide layer is improved on the substrate, it enhances the performance of the devices. We can increase the thickness of oxide layer, by increasing the ozone time treatment. We reported first time to measurement of thickness of controlled silicon dioxide layer and obtained the uniform layer for same ozone time.
Method for Veterbi decoding of large constraint length convolutional codes

NASA Technical Reports Server (NTRS)

Hsu, In-Shek (Inventor); Truong, Trieu-Kie (Inventor); Reed, Irving S. (Inventor); Jing, Sun (Inventor)

1988-01-01

A new method of Viterbi decoding of convolutional codes lends itself to a pipline VLSI architecture using a single sequential processor to compute the path metrics in the Viterbi trellis. An array method is used to store the path information for NK intervals where N is a number, and K is constraint length. The selected path at the end of each NK interval is then selected from the last entry in the array. A trace-back method is used for returning to the beginning of the selected path back, i.e., to the first time unit of the interval NK to read out the stored branch metrics of the selected path which correspond to the message bits. The decoding decision made in this way is no longer maximum likelihood, but can be almost as good, provided that constraint length K in not too small. The advantage is that for a long message, it is not necessary to provide a large memory to store the trellis derived information until the end of the message to select the path that is to be decoded; the selection is made at the end of every NK time unit, thus decoding a long message in successive blocks.
A low-cost transportable ground station for capture and processing of direct broadcast EOS satellite data

NASA Technical Reports Server (NTRS)

Davis, Don; Bennett, Toby; Short, Nicholas M., Jr.

1994-01-01

The Earth Observing System (EOS), part of a cohesive national effort to study global change, will deploy a constellation of remote sensing spacecraft over a 15 year period. Science data from the EOS spacecraft will be processed and made available to a large community of earth scientists via NASA institutional facilities. A number of these spacecraft are also providing an additional interface to broadcast data directly to users. Direct broadcast of real-time science data from overhead spacecraft has valuable applications including validation of field measurements, planning science campaigns, and science and engineering education. The success and usefulness of EOS direct broadcast depends largely on the end-user cost of receiving the data. To extend this capability to the largest possible user base, the cost of receiving ground stations must be as low as possible. To achieve this goal, NASA Goddard Space Flight Center is developing a prototype low-cost transportable ground station for EOS direct broadcast data based on Very Large Scale Integration (VLSI) components and pipelined, multiprocessing architectures. The targeted reproduction cost of this system is less than $200K. This paper describes a prototype ground station and its constituent components.
Austro-Hungarian Public Building Refurbishment and Energy Efficiency Measures - A Case Study on a Public Building in Sarajevo

NASA Astrophysics Data System (ADS)

Salihbegović, Amira; Čaušević, Amir; Rustempašić, Nerman; Avdić, Dženis; Smajlović, Esad

2017-10-01

Among other pieces of architectural historical heritage in Sarajevo, and Bosnia-Herzegovina in general, the Austro-Hungarian architecture has preserved its original architectural, artistic and engineering characteristics. Both residential and public representative urban blocks, streets and squares are of distinguishable ambience in the architectural and urban image of the city and are testifying about our architectural past. A number of buildings is valorised and protected by law in terms of their architectural, artistic and historical value. In addition, these buildings have a distinct functional, ambiental, historical, and even aesthetical value. To make them last longer, refurbishment of these buildings is challenging and presents potential and multiple benefits for the city, and beyond. Refurbishing built environment through functional reorganizing, redesign and energy efficiency measures applications could result in prolonged longevity, architectural identity preservation and interior comfort improvement. Besides, implemented measures for energy efficiency, through the refurbishment process, should optimize the needs for energy consumption in treated buildings. This paper defines options in comfort improvements and redesign, without implying risks to the building longevity, analyses interventions and energy efficiency measures which would enable potential energy saving assessment in the refurbishment process of masonry buildings. This paper also discusses the different techniques that can be adopted for conservation and preservation of historical masonry buildings from the Austro-Hungarian period dealing with energy efficiency. The works were preceded by historical research and on-site investigations. This paper describes a methodology to quantify their vulnerability. A scheme of structural retrofitting is suggested following the research conducted. Revitalization of the building consisted in the reconstruction of the old building structure, creating the inner courtyard and covering it with a glass roof.
Approaching the Ultimate Limits of Communication Efficiency with a Photon-Counting Detector

NASA Technical Reports Server (NTRS)

Erkmen, Baris; Moision, Bruce; Dolinar, Samuel J.; Birnbaum, Kevin M.; Divsalar, Dariush

2012-01-01

Coherent states achieve the Holevo capacity of a pure-loss channel when paired with an optimal measurement, but a physical realization of this measurement is as of yet unknown, and it is also likely to be of high complexity. In this paper, we focus on the photon-counting measurement and study the photon and dimensional efficiencies attainable with modulations over classical- and nonclassical-state alphabets. We first review the state-of-the-art coherent on-off-keying (OOK) with a photoncounting measurement, illustrating its asymptotic inefficiency relative to the Holevo limit. We show that a commonly made Poisson approximation in thermal noise leads to unbounded photon information efficiencies, violating the conjectured Holevo limit. We analyze two binary-modulation architectures that improve upon the dimensional versus photon efficiency tradeoff achievable with conventional OOK. We show that at high photon efficiency these architectures achieve an efficiency tradeoff that differs from the best possible tradeoff--determined by the Holevo capacity--by only a constant factor. The first architecture we analyze is a coherent-state transmitter that relies on feedback from the receiver to control the transmitted energy. The second architecture uses a single-photon number-state source.
On the optimality of code options for a universal noiseless coder

NASA Technical Reports Server (NTRS)

Yeh, Pen-Shu; Rice, Robert F.; Miller, Warner

1991-01-01

A universal noiseless coding structure was developed that provides efficient performance over an extremely broad range of source entropy. This is accomplished by adaptively selecting the best of several easily implemented variable length coding algorithms. Custom VLSI coder and decoder modules capable of processing over 20 million samples per second are currently under development. The first of the code options used in this module development is shown to be equivalent to a class of Huffman code under the Humblet condition, other options are shown to be equivalent to the Huffman codes of a modified Laplacian symbol set, at specified symbol entropy values. Simulation results are obtained on actual aerial imagery, and they confirm the optimality of the scheme. On sources having Gaussian or Poisson distributions, coder performance is also projected through analysis and simulation.
On Approaching the Ultimate Limits of Communication Using a Photon-Counting Detector

NASA Technical Reports Server (NTRS)

Erkmen, Baris I.; Moision, Bruce E.; Dolinar, Samuel J.; Birnbaum, Kevin M.; Divsalar, Dariush

2012-01-01

Coherent states achieve the Holevo capacity of a pure-loss channel when paired with an optimal measurement, but a physical realization of this measurement scheme is as of yet unknown, and it is also likely to be of high complexity. In this paper, we focus on the photon-counting measurement and study the photon and dimensional efficiencies attainable with modulations over classical- and nonclassical-state alphabets. We analyze two binary modulation architectures that improve upon the dimensional versus photon efficiency tradeoff achievable with the state-of-the-art coherent-state on-off keying modulation. We show that at high photon efficiency these architectures achieve an efficiency tradeoff that differs from the best possible tradeoff--determined by the Holevo capacity--by only a constant factor. The first architecture we analyze is a coherent-state transmitter that relies on feedback from the receiver to control the transmitted energy. The second architecture uses a single-photon number-state source.
Achieving Energy Efficiency in Accordance with Bioclimatic Architecture Principles

NASA Astrophysics Data System (ADS)

Bajcinovci, Bujar; Jerliu, Florina

2016-12-01

By using our natural resources, and through inefficient use of energy, we produce much waste that can be recycled as a useful resource, which further contributes to climate change. This study aims to address energy effective bioclimatic architecture principles, by which we can achieve a potential energy savings, estimated at thirty-three per cent, mainly through environmentally affordable reconstruction, resulting in low negative impact on the environment. The study presented in this paper investigated the Ulpiana neighbourhood of Prishtina City, focusing on urban design challenges, energy efficiency and air pollution issues. The research methods consist of empirical observations through the urban spatial area using a comparative method, in order to receive clearer data and information research is conducted within Ulpiana's urban blocks, shapes of architectural structures, with the objective focusing on bioclimatic features in terms of the morphology and microclimate of Ulpiana. Energy supply plays a key role in the economic development of any country, hence, bioclimatic design principles for sustainable architecture and energy efficiency, present an evolutive integrated strategy for achieving efficiency and healthier conditions for Kosovar communities. Conceptual findings indicate that with the integrated design strategy: energy efficiency, and passive bioclimatic principles will result in a bond of complex interrelation between nature, architecture, and community. The aim of this study is to promote structured organized actions to be taken in Prishtina, and Kosovo, which will result in improved energy efficiency in all sectors, and particularly in the residential housing sector.
Research News: Are VLSI Microcircuits Too Hard to Design?

ERIC Educational Resources Information Center

Robinson, Arthur L.

1980-01-01

This research news article on microelectronics discusses the scientific challenge the integrated circuit industry will have in the next decade, for designing the complicated microcircuits made possible by advancing miniaturization technology. (HM)

VLSI-based video event triggering for image data compression

NASA Astrophysics Data System (ADS)

Williams, Glenn L.

1994-02-01

Long-duration, on-orbit microgravity experiments require a combination of high resolution and high frame rate video data acquisition. The digitized high-rate video stream presents a difficult data storage problem. Data produced at rates of several hundred million bytes per second may require a total mission video data storage requirement exceeding one terabyte. A NASA-designed, VLSI-based, highly parallel digital state machine generates a digital trigger signal at the onset of a video event. High capacity random access memory storage coupled with newly available fuzzy logic devices permits the monitoring of a video image stream for long term (DC-like) or short term (AC-like) changes caused by spatial translation, dilation, appearance, disappearance, or color change in a video object. Pre-trigger and post-trigger storage techniques are then adaptable to archiving only the significant video images.
Asynchronous transfer mode distribution network by use of an optoelectronic VLSI switching chip.

PubMed

Lentine, A L; Reiley, D J; Novotny, R A; Morrison, R L; Sasian, J M; Beckman, M G; Buchholz, D B; Hinterlong, S J; Cloonan, T J; Richards, G W; McCormick, F B

1997-03-10

We describe a new optoelectronic switching system demonstration that implements part of the distribution fabric for a large asynchronous transfer mode (ATM) switch. The system uses a single optoelectronic VLSI modulator-based switching chip with more than 4000 optical input-outputs. The optical system images the input fibers from a two-dimensional fiber bundle onto this chip. A new optomechanical design allows the system to be mounted in a standard electronic equipment frame. A large section of the switch was operated as a 208-Mbits/s time-multiplexed space switch, which can serve as part of an ATM switch by use of an appropriate out-of-band controller. A larger section with 896 input light beams and 256 output beams was operated at 160 Mbits/s as a slowly reconfigurable space switch.
Analog VLSI system for active drag reduction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gupta, B.; Goodman, R.; Jiang, F.

1996-10-01

In today`s cost-conscious air transportation industry, fuel costs are a substantial economic concern. Drag reduction is an important way to reduce costs. Even a 5% reduction in drag translates into estimated savings of millions of dollars in fuel costs. Drawing inspiration from the structure of shark skin, the authors are building a system to reduce drag along a surface. Our analog VLSI system interfaces with microfabricated, constant-temperature shear stress sensors. It detects regions of high shear stress and outputs a control signal to activate a microactuator. We are in the process of verifying the actual drag reduction by controlling microactuatorsmore » in wind tunnel experiments. We are encouraged that an approach similar to one that biology employs provides a very useful contribution to the problem of drag reduction. 9 refs., 21 figs.« less
VLSI-based Video Event Triggering for Image Data Compression

NASA Technical Reports Server (NTRS)

Williams, Glenn L.

1994-01-01

Long-duration, on-orbit microgravity experiments require a combination of high resolution and high frame rate video data acquisition. The digitized high-rate video stream presents a difficult data storage problem. Data produced at rates of several hundred million bytes per second may require a total mission video data storage requirement exceeding one terabyte. A NASA-designed, VLSI-based, highly parallel digital state machine generates a digital trigger signal at the onset of a video event. High capacity random access memory storage coupled with newly available fuzzy logic devices permits the monitoring of a video image stream for long term (DC-like) or short term (AC-like) changes caused by spatial translation, dilation, appearance, disappearance, or color change in a video object. Pre-trigger and post-trigger storage techniques are then adaptable to archiving only the significant video images.
VLSI Design of Trusted Virtual Sensors.

PubMed

Martínez-Rodríguez, Macarena C; Prada-Delgado, Miguel A; Brox, Piedad; Baturone, Iluminada

2018-01-25

This work presents a Very Large Scale Integration (VLSI) design of trusted virtual sensors providing a minimum unitary cost and very good figures of size, speed and power consumption. The sensed variable is estimated by a virtual sensor based on a configurable and programmable PieceWise-Affine hyper-Rectangular (PWAR) model. An algorithm is presented to find the best values of the programmable parameters given a set of (empirical or simulated) input-output data. The VLSI design of the trusted virtual sensor uses the fast authenticated encryption algorithm, AEGIS, to ensure the integrity of the provided virtual measurement and to encrypt it, and a Physical Unclonable Function (PUF) based on a Static Random Access Memory (SRAM) to ensure the integrity of the sensor itself. Implementation results of a prototype designed in a 90-nm Complementary Metal Oxide Semiconductor (CMOS) technology show that the active silicon area of the trusted virtual sensor is 0.86 mm 2 and its power consumption when trusted sensing at 50 MHz is 7.12 mW. The maximum operation frequency is 85 MHz, which allows response times lower than 0.25 μ s. As application example, the designed prototype was programmed to estimate the yaw rate in a vehicle, obtaining root mean square errors lower than 1.1%. Experimental results of the employed PUF show the robustness of the trusted sensing against aging and variations of the operation conditions, namely, temperature and power supply voltage (final value as well as ramp-up time).
VLSI Design of Trusted Virtual Sensors

PubMed Central

2018-01-01

This work presents a Very Large Scale Integration (VLSI) design of trusted virtual sensors providing a minimum unitary cost and very good figures of size, speed and power consumption. The sensed variable is estimated by a virtual sensor based on a configurable and programmable PieceWise-Affine hyper-Rectangular (PWAR) model. An algorithm is presented to find the best values of the programmable parameters given a set of (empirical or simulated) input-output data. The VLSI design of the trusted virtual sensor uses the fast authenticated encryption algorithm, AEGIS, to ensure the integrity of the provided virtual measurement and to encrypt it, and a Physical Unclonable Function (PUF) based on a Static Random Access Memory (SRAM) to ensure the integrity of the sensor itself. Implementation results of a prototype designed in a 90-nm Complementary Metal Oxide Semiconductor (CMOS) technology show that the active silicon area of the trusted virtual sensor is 0.86 mm2 and its power consumption when trusted sensing at 50 MHz is 7.12 mW. The maximum operation frequency is 85 MHz, which allows response times lower than 0.25 μs. As application example, the designed prototype was programmed to estimate the yaw rate in a vehicle, obtaining root mean square errors lower than 1.1%. Experimental results of the employed PUF show the robustness of the trusted sensing against aging and variations of the operation conditions, namely, temperature and power supply voltage (final value as well as ramp-up time). PMID:29370141
Energy Efficiency for Architectural Drafting Instructors.

ERIC Educational Resources Information Center

Scharmann, Larry, Ed.

Intended primarily but not solely for use at the postsecondary level, this curriculum guide contains five units on energy efficiency that were designed to be incorporated into an existing program in architectural drafting. The following topics are examined: energy conservation awareness (residential energy use and audit procedures); residential…
Neuromorphic VLSI vision system for real-time texture segregation.

PubMed

Shimonomura, Kazuhiro; Yagi, Tetsuya

2008-10-01

The visual system of the brain can perceive an external scene in real-time with extremely low power dissipation, although the response speed of an individual neuron is considerably lower than that of semiconductor devices. The neurons in the visual pathway generate their receptive fields using a parallel and hierarchical architecture. This architecture of the visual cortex is interesting and important for designing a novel perception system from an engineering perspective. The aim of this study is to develop a vision system hardware, which is designed inspired by a hierarchical visual processing in V1, for real time texture segregation. The system consists of a silicon retina, orientation chip, and field programmable gate array (FPGA) circuit. The silicon retina emulates the neural circuits of the vertebrate retina and exhibits a Laplacian-Gaussian-like receptive field. The orientation chip selectively aggregates multiple pixels of the silicon retina in order to produce Gabor-like receptive fields that are tuned to various orientations by mimicking the feed-forward model proposed by Hubel and Wiesel. The FPGA circuit receives the output of the orientation chip and computes the responses of the complex cells. Using this system, the neural images of simple cells were computed in real-time for various orientations and spatial frequencies. Using the orientation-selective outputs obtained from the multi-chip system, a real-time texture segregation was conducted based on a computational model inspired by psychophysics and neurophysiology. The texture image was filtered by the two orthogonally oriented receptive fields of the multi-chip system and the filtered images were combined to segregate the area of different texture orientation with the aid of FPGA. The present system is also useful for the investigation of the functions of the higher-order cells that can be obtained by combining the simple and complex cells.
Hardware architecture design of a fast global motion estimation method

NASA Astrophysics Data System (ADS)

Liang, Chaobing; Sang, Hongshi; Shen, Xubang

2015-12-01

VLSI implementation of gradient-based global motion estimation (GME) faces two main challenges: irregular data access and high off-chip memory bandwidth requirement. We previously proposed a fast GME method that reduces computational complexity by choosing certain number of small patches containing corners and using them in a gradient-based framework. A hardware architecture is designed to implement this method and further reduce off-chip memory bandwidth requirement. On-chip memories are used to store coordinates of the corners and template patches, while the Gaussian pyramids of both the template and reference frame are stored in off-chip SDRAMs. By performing geometric transform only on the coordinates of the center pixel of a 3-by-3 patch in the template image, a 5-by-5 area containing the warped 3-by-3 patch in the reference image is extracted from the SDRAMs by burst read. Patched-based and burst mode data access helps to keep the off-chip memory bandwidth requirement at the minimum. Although patch size varies at different pyramid level, all patches are processed in term of 3x3 patches, so the utilization of the patch-processing circuit reaches 100%. FPGA implementation results show that the design utilizes 24,080 bits on-chip memory and for a sequence with resolution of 352x288 and frequency of 60Hz, the off-chip bandwidth requirement is only 3.96Mbyte/s, compared with 243.84Mbyte/s of the original gradient-based GME method. This design can be used in applications like video codec, video stabilization, and super-resolution, where real-time GME is a necessity and minimum memory bandwidth requirement is appreciated.
A learnable parallel processing architecture towards unity of memory and computing

NASA Astrophysics Data System (ADS)

Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

2015-08-01

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Efficient architecture for spike sorting in reconfigurable hardware.

PubMed

Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying

2013-11-01

This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.
A learnable parallel processing architecture towards unity of memory and computing.

PubMed

Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

2015-08-14

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Evaluating the Effectiveness of Reference Models in Federating Enterprise Architectures

ERIC Educational Resources Information Center

Wilson, Jeffery A.

2012-01-01

Agencies need to collaborate with each other to perform missions, improve mission performance, and find efficiencies. The ability of individual government agencies to collaborate with each other for mission and business success and efficiency is complicated by the different techniques used to describe their Enterprise Architectures (EAs).…
Embeddable Reconfigurable Neuroprocessors

NASA Technical Reports Server (NTRS)

Daud, Taher; Duong, Tuan; Langenbacher, Harry; Tran, Mua; Thakoor, Anil

1993-01-01

Reconfigurable and cascadable building block neural network chips, fabricated using analog VLSI design tools, are interfaced to a PC. The building block chip designs, the cascadability and the hardware-in-the-loop supervised learning aspects of these chips are described.
Efficient k-Winner-Take-All Competitive Learning Hardware Architecture for On-Chip Learning

PubMed Central

Ou, Chien-Min; Li, Hui-Ya; Hwang, Wen-Jyi

2012-01-01

A novel k-winners-take-all (k-WTA) competitive learning (CL) hardware architecture is presented for on-chip learning in this paper. The architecture is based on an efficient pipeline allowing k-WTA competition processes associated with different training vectors to be performed concurrently. The pipeline architecture employs a novel codeword swapping scheme so that neurons failing the competition for a training vector are immediately available for the competitions for the subsequent training vectors. The architecture is implemented by the field programmable gate array (FPGA). It is used as a hardware accelerator in a system on programmable chip (SOPC) for realtime on-chip learning. Experimental results show that the SOPC has significantly lower training time than that of other k-WTA CL counterparts operating with or without hardware support.
ESPC Common Model Architecture

DTIC Science & Technology

2014-09-30

1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. ESPC Common Model Architecture Earth System Modeling...Operational Prediction Capability (NUOPC) was established between NOAA and Navy to develop common software architecture for easy and efficient...development under a common model architecture and other software-related standards in this project. OBJECTIVES NUOPC proposes to accelerate
The Aeronautical Data Link: Decision Framework for Architecture Analysis

NASA Technical Reports Server (NTRS)

Morris, A. Terry; Goode, Plesent W.

2003-01-01

A decision analytic approach that develops optimal data link architecture configuration and behavior to meet multiple conflicting objectives of concurrent and different airspace operations functions has previously been developed. The approach, premised on a formal taxonomic classification that correlates data link performance with operations requirements, information requirements, and implementing technologies, provides a coherent methodology for data link architectural analysis from top-down and bottom-up perspectives. This paper follows the previous research by providing more specific approaches for mapping and transitioning between the lower levels of the decision framework. The goal of the architectural analysis methodology is to assess the impact of specific architecture configurations and behaviors on the efficiency, capacity, and safety of operations. This necessarily involves understanding the various capabilities, system level performance issues and performance and interface concepts related to the conceptual purpose of the architecture and to the underlying data link technologies. Efficient and goal-directed data link architectural network configuration is conditioned on quantifying the risks and uncertainties associated with complex structural interface decisions. Deterministic and stochastic optimal design approaches will be discussed that maximize the effectiveness of architectural designs.
Approximate, computationally efficient online learning in Bayesian spiking neurons.

PubMed

Kuhlmann, Levin; Hauser-Raspe, Michael; Manton, Jonathan H; Grayden, David B; Tapson, Jonathan; van Schaik, André

2014-03-01

Bayesian spiking neurons (BSNs) provide a probabilistic interpretation of how neurons perform inference and learning. Online learning in BSNs typically involves parameter estimation based on maximum-likelihood expectation-maximization (ML-EM) which is computationally slow and limits the potential of studying networks of BSNs. An online learning algorithm, fast learning (FL), is presented that is more computationally efficient than the benchmark ML-EM for a fixed number of time steps as the number of inputs to a BSN increases (e.g., 16.5 times faster run times for 20 inputs). Although ML-EM appears to converge 2.0 to 3.6 times faster than FL, the computational cost of ML-EM means that ML-EM takes longer to simulate to convergence than FL. FL also provides reasonable convergence performance that is robust to initialization of parameter estimates that are far from the true parameter values. However, parameter estimation depends on the range of true parameter values. Nevertheless, for a physiologically meaningful range of parameter values, FL gives very good average estimation accuracy, despite its approximate nature. The FL algorithm therefore provides an efficient tool, complementary to ML-EM, for exploring BSN networks in more detail in order to better understand their biological relevance. Moreover, the simplicity of the FL algorithm means it can be easily implemented in neuromorphic VLSI such that one can take advantage of the energy-efficient spike coding of BSNs.
Progress in a novel architecture for high performance processing

NASA Astrophysics Data System (ADS)

Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

2018-04-01

The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
Best of College Architecture: AS&U's Architectural Competition.

ERIC Educational Resources Information Center

American School and University, 1981

1981-01-01

A restoration/addition that preserves traditional New England architecture, a sleek vocational-technical college on the prairie, and two energy efficient masonry buildings were selected as winners in the 1981 American School & University Design Awards competition. (Author/MLF)

Realizing Efficient Energy Harvesting from Organic Photovoltaic Cells

NASA Astrophysics Data System (ADS)

Zou, Yunlong

Organic photovoltaic cells (OPVs) are emerging field of research in renewable energy. The development of OPVs in recent years has made this technology viable for many niche applications. In order to realize widespread application however, the power conversion efficiency requires further improvement. The efficiency of an OPV depends on the short-circuit current density (JSC), open-circuit voltage (VOC) and fill factor (FF). For state-of-the-art devices, JSC is mostly optimized with the application of novel low-bandgap materials and a bulk heterojunction device architecture (internal quantum efficiency approaching 100%). The remaining limiting factors are the low VOC and FF. This work focuses on overcoming these bottlenecks for improved efficiency. Temperature dependent measurements of device performance are used to examine both charge transfer and exciton ionization process in OPVs. The results permit an improved understanding of the intrinsic limit for VOC in various device architectures and provide insight on device operation. Efforts have also been directed at engineering device architecture for optimized FF, realizing a very high efficiency of 8% for vapor deposited small molecule OPVs. With collaborators, new molecules with tailored desired energy levels are being designed for further improvements in efficiency. A new type of hybrid organic-inorganic perovskite material is also included in this study. By addressing processing issues and anomalous hysteresis effects, a very high efficiency of 19.1% is achieved. Moving forward, topics including engineering film crystallinity, exploring tandem architectures and understanding degradation mechanisms will further push OPVs toward broad commercialization.
GaAs VLSI technology and circuit elements for DSP

NASA Astrophysics Data System (ADS)

Mikkelson, James M.

1990-10-01

Recent progress in digital GaAs circuit performance and complexity is presented to demonstrate the current capabilities of GaAs components. High density GaAs process technology and circuit design techniques are described and critical issues for achieving favorable complexity speed power and cost tradeoffs are reviewed. Some DSP building blocks are described to provide examples of what types of DSP systems could be implemented with present GaAs technology. DIGITAL GaAs CIRCUIT CAPABILITIES In the past few years the capabilities of digital GaAs circuits have dramatically increased to the VLSI level. Major gains in circuit complexity and power-delay products have been achieved by the use of silicon-like process technologies and simple circuit topologies. The very high speed and low power consumption of digital GaAs VLSI circuits have made GaAs a desirable alternative to high performance silicon in hardware intensive high speed system applications. An example of the performance and integration complexity available with GaAs VLSI circuits is the 64x64 crosspoint switch shown in figure 1. This switch which is the most complex GaAs circuit currently available is designed on a 30 gate GaAs gate array. It operates at 200 MHz and dissipates only 8 watts of power. The reasons for increasing the level of integration of GaAs circuits are similar to the reasons for the continued increase of silicon circuit complexity. The market factors driving GaAs VLSI are system design methodology system cost power and reliability. System designers are hesitant or unwilling to go backwards to previous design techniques and lower levels of integration. A more highly integrated system in a lower performance technology can often approach the performance of a system in a higher performance technology at a lower level of integration. Higher levels of integration also lower the system component count which reduces the system cost size and power consumption while improving the system reliability. For large gate count circuits the power per gate must be minimized to prevent reliability and cooling problems. The technical factors which favor increasing GaAs circuit complexity are primarily related to reducing the speed and power penalties incurred when crossing chip boundaries. Because the internal GaAs chip logic levels are not compatible with standard silicon I/O levels input receivers and output drivers are needed to convert levels. These I/O circuits add significant delay to logic paths consume large amounts of power and use an appreciable portion of the die area. The effects of these I/O penalties can be reduced by increasing the ratio of core logic to I/O on a chip. DSP operations which have a large number of logic stages between the input and the output are ideal candidates to take advantage of the performance of GaAs digital circuits. Figure 2 is a schematic representation of the I/O penalties encountered when converting from ECL levels to GaAs
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

NASA Astrophysics Data System (ADS)

Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

2014-09-01

Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Genetic architecture of feed efficiency in mid-lactation Holstein dairy cows

USDA-ARS?s Scientific Manuscript database

The objective of this study was to explore the genetic architecture and biological basis of feed efficiency in lactating Holstein cows. In total, 4,918 cows with actual or imputed genotypes for 60,671 SNP had individual feed intake, milk yield, milk composition, and body weight records. Cows were ...
Impact of VLSI/VHSIC on satellite on-board signal processing

NASA Astrophysics Data System (ADS)

Aanstoos, J. V.; Ruedger, W. H.; Snyder, W. E.; Kelly, W. L.

Forecasted improvements in IC fabrication techniques, such as the use of X-ray lithography, are expected to yield submicron circuit feature sizes within the decade of the 1980s. As dimensions decrease, reliability, cost, speed, power consumption and density improvements will be realized which have a significant impact on the capabilities of onboard spacecraft signal processing functions. This will in turn result in increases of the intelligence that may be deployed on spaceborne remote sensing platforms. Among programs oriented toward such goals are the silicon-based Very High Speed Integrated Circuit (VHSIC) researches sponsored by the U.S. Department of Defense, and efforts toward the development of GaAs devices which will compete with silicon VLSI technology for future applications. GaAs has an electron mobility which is five to six times that of silicon, and promises commensurate computation speed increases under low field conditions.
A VLSI implementation of DCT using pass transistor technology

NASA Technical Reports Server (NTRS)

Kamath, S.; Lynn, Douglas; Whitaker, Sterling

1992-01-01

A VLSI design for performing the Discrete Cosine Transform (DCT) operation on image blocks of size 16 x 16 in a real time fashion operating at 34 MHz (worst case) is presented. The process used was Hewlett-Packard's CMOS26--A 3 metal CMOS process with a minimum feature size of 0.75 micron. The design is based on Multiply-Accumulate (MAC) cells which make use of a modified Booth recoding algorithm for performing multiplication. The design of these cells is straight forward, and the layouts are regular with no complex routing. Two versions of these MAC cells were designed and their layouts completed. Both versions were simulated using SPICE to estimate their performance. One version is slightly faster at the cost of larger silicon area and higher power consumption. An improvement in speed of almost 20 percent is achieved after several iterations of simulation and re-sizing.
Performance of Trellis Coded 256 QAM super-multicarrier modem VLSI's for SDH interface outage-free digital microwave radio

NASA Astrophysics Data System (ADS)

Aikawa, Satoru; Nakamura, Yasuhisa; Takanashi, Hitoshi

1994-02-01

This paper describes the performance of an outage free SXH (Synchronous Digital Hierarchy) interface 256 QAM modem. An outage free DMR (Digital Microwave Radio) is achieved by a high coding gain trellis coded SPORT QAM and Super Multicarrier modem. A new frame format and its associated circuits connect the outage free modem to the SDH interface. The newly designed VLSI's are key devices for developing the modem. As an overall modem performance, BER (bit error rate) characteristics and equipment signatures are presented. A coding gain of 4.7 dB (at a BER of 10(exp -4)) is obtained using SPORT 256 QAM and Viterbi decoding. This coding gain is realized by trellis coding as well as by increasing of transmission rate. Roll-off factor is decreased to maintain the same frequency occupation and modulation level as ordinary SDH 256 QAM modern.
Biophysical synaptic dynamics in an analog VLSI network of Hodgkin-Huxley neurons.

PubMed

Yu, Theodore; Cauwenberghs, Gert

2009-01-01

We study synaptic dynamics in a biophysical network of four coupled spiking neurons implemented in an analog VLSI silicon microchip. The four neurons implement a generalized Hodgkin-Huxley model with individually configurable rate-based kinetics of opening and closing of Na+ and K+ ion channels. The twelve synapses implement a rate-based first-order kinetic model of neurotransmitter and receptor dynamics, accounting for NMDA and non-NMDA type chemical synapses. The implemented models on the chip are fully configurable by 384 parameters accounting for conductances, reversal potentials, and pre/post-synaptic voltage-dependence of the channel kinetics. We describe the models and present experimental results from the chip characterizing single neuron dynamics, single synapse dynamics, and multi-neuron network dynamics showing phase-locking behavior as a function of synaptic coupling strength. The 3mm x 3mm microchip consumes 1.29 mW power making it promising for applications including neuromorphic modeling and neural prostheses.
Modeling selective attention using a neuromorphic analog VLSI device.

PubMed

Indiveri, G

2000-12-01

Attentional mechanisms are required to overcome the problem of flooding a limited processing capacity system with information. They are present in biological sensory systems and can be a useful engineering tool for artificial visual systems. In this article we present a hardware model of a selective attention mechanism implemented on a very large-scale integration (VLSI) chip, using analog neuromorphic circuits. The chip exploits a spike-based representation to receive, process, and transmit signals. It can be used as a transceiver module for building multichip neuromorphic vision systems. We describe the circuits that carry out the main processing stages of the selective attention mechanism and provide experimental data for each circuit. We demonstrate the expected behavior of the model at the system level by stimulating the chip with both artificially generated control signals and signals obtained from a saliency map, computed from an image containing several salient features.
Mixed-mode VLSI optic flow sensors for in-flight control of a micro air vehicle

NASA Astrophysics Data System (ADS)

Barrows, Geoffrey L.; Neely, C.

2000-11-01

NRL is developing compact optic flow sensors for use in a variety of small-scale navigation and collision avoidance tasks. These sensors are being developed for use in micro air vehicles (MAVs), which are autonomous aircraft whose maximum dimension is on the order of 15 cm. To achieve desired weight specifications of 1 - 2 grams, mixed-signal VLSI circuitry is being used to develop compact focal plane sensors that directly compute optic flow. As an interim proof of principle, we have constructed a sensor comprising a focal plane sensor head with on-chip processing and a back-end PIC microcontroller. This interim sensors weighs approximately 25 grams and is able to measure optic flow with real-world and low-contrast textures. Variations of this sensor have been used to control the flight of a glider in real-time to avoid collisions with walls.
Study of molybdenum-aluminum interdiffusion kinetics and contact resistance for VLSI applications

NASA Astrophysics Data System (ADS)

Singh, R. N.; Brown, D. M.; Kim, M. J.; Smith, G. A.

1985-12-01

Interdiffusion barrier characteristics of molybdenum thin film with aluminum-1% Si is studied between 733 and 763 K via sheet and contact resistance measurements, Rutherford backscattering spectrometry, secondary ion mass spectrometry, and x-ray diffraction analysis. The results indicate that thermal annealing of Mo/Al-1% Si thin film couples leads to MoAl12 compound formation initially as a nonplanar front, but extensive annealing results in complete transformation of Al-1% Si to MoAl12 and a significant increase in contact resistance. The interdiffusion kinetics is diffusion controlled and shows parabolic time dependence, incubation periods, and extremely high activation energy value of 5.9 eV. The incubation periods and an high activation energy values are explained by the presence of silicon precipitates at the Mo/Al-1% Si interface. Implications of these observations to VLSI device characteristics are discussed and a safe time-temperature processing regime is proposed.
Single board system for fuzzy inference

NASA Technical Reports Server (NTRS)

Symon, James R.; Watanabe, Hiroyuki

1991-01-01

The very large scale integration (VLSI) implementation of a fuzzy logic inference mechanism allows the use of rule-based control and decision making in demanding real-time applications. Researchers designed a full custom VLSI inference engine. The chip was fabricated using CMOS technology. The chip consists of 688,000 transistors of which 476,000 are used for RAM memory. The fuzzy logic inference engine board system incorporates the custom designed integrated circuit into a standard VMEbus environment. The Fuzzy Logic system uses Transistor-Transistor Logic (TTL) parts to provide the interface between the Fuzzy chip and a standard, double height VMEbus backplane, allowing the chip to perform application process control through the VMEbus host. High level C language functions hide details of the hardware system interface from the applications level programmer. The first version of the board was installed on a robot at Oak Ridge National Laboratory in January of 1990.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.

PubMed

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2014-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

NASA Astrophysics Data System (ADS)

Dias, Tiago; Roma, Nuno; Sousa, Leonel

2014-12-01

A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.
Lossless compression of VLSI layout image data.

PubMed

Dai, Vito; Zakhor, Avideh

2006-09-01

We present a novel lossless compression algorithm called Context Copy Combinatorial Code (C4), which integrates the advantages of two very disparate compression techniques: context-based modeling and Lempel-Ziv (LZ) style copying. While the algorithm can be applied to many lossless compression applications, such as document image compression, our primary target application has been lossless compression of integrated circuit layout image data. These images contain a heterogeneous mix of data: dense repetitive data better suited to LZ-style coding, and less dense structured data, better suited to context-based encoding. As part of C4, we have developed a novel binary entropy coding technique called combinatorial coding which is simultaneously as efficient as arithmetic coding, and as fast as Huffman coding. Compression results show C4 outperforms JBIG, ZIP, BZIP2, and two-dimensional LZ, and achieves lossless compression ratios greater than 22 for binary layout image data, and greater than 14 for gray-pixel image data.
An efficient current-based logic cell model for crosstalk delay analysis

NASA Astrophysics Data System (ADS)

Nazarian, Shahin; Das, Debasish

2013-04-01

Logic cell modelling is an important component in the analysis and design of CMOS integrated circuits, mostly due to nonlinear behaviour of CMOS cells with respect to the voltage signal at their input and output pins. A current-based model for CMOS logic cells is presented, which can be used for effective crosstalk noise and delta delay analysis in CMOS VLSI circuits. Existing current source models are expensive and need a new set of Spice-based characterisation, which is not compatible with typical EDA tools. In this article we present Imodel, a simple nonlinear logic cell model that can be derived from the typical cell libraries such as NLDM, with accuracy much higher than NLDM-based cell delay models. In fact, our experiments show an average error of 3% compared to Spice. This level of accuracy comes with a maximum runtime penalty of 19% compared to NLDM-based cell delay models on medium-sized industrial designs.
Reed Solomon codes for error control in byte organized computer memory systems

NASA Technical Reports Server (NTRS)

Lin, S.; Costello, D. J., Jr.

1984-01-01

A problem in designing semiconductor memories is to provide some measure of error control without requiring excessive coding overhead or decoding time. In LSI and VLSI technology, memories are often organized on a multiple bit (or byte) per chip basis. For example, some 256K-bit DRAM's are organized in 32Kx8 bit-bytes. Byte oriented codes such as Reed Solomon (RS) codes can provide efficient low overhead error control for such memories. However, the standard iterative algorithm for decoding RS codes is too slow for these applications. Some special decoding techniques for extended single-and-double-error-correcting RS codes which are capable of high speed operation are presented. These techniques are designed to find the error locations and the error values directly from the syndrome without having to use the iterative algorithm to find the error locator polynomial.
Carbon nanotube circuit integration up to sub-20 nm channel lengths.

PubMed

Shulaker, Max Marcel; Van Rethy, Jelle; Wu, Tony F; Liyanage, Luckshitha Suriyasena; Wei, Hai; Li, Zuanyi; Pop, Eric; Gielen, Georges; Wong, H-S Philip; Mitra, Subhasish

2014-04-22

Carbon nanotube (CNT) field-effect transistors (CNFETs) are a promising emerging technology projected to achieve over an order of magnitude improvement in energy-delay product, a metric of performance and energy efficiency, compared to silicon-based circuits. However, due to substantial imperfections inherent with CNTs, the promise of CNFETs has yet to be fully realized. Techniques to overcome these imperfections have yielded promising results, but thus far only at large technology nodes (1 μm device size). Here we demonstrate the first very large scale integration (VLSI)-compatible approach to realizing CNFET digital circuits at highly scaled technology nodes, with devices ranging from 90 nm to sub-20 nm channel lengths. We demonstrate inverters functioning at 1 MHz and a fully integrated CNFET infrared light sensor and interface circuit at 32 nm channel length. This demonstrates the feasibility of realizing more complex CNFET circuits at highly scaled technology nodes.
Adaptive Integration of the Compressed Algorithm of CS and NPC for the ECG Signal Compressed Algorithm in VLSI Implementation

PubMed Central

Tseng, Yun-Hua; Lu, Chih-Wen

2017-01-01

Compressed sensing (CS) is a promising approach to the compression and reconstruction of electrocardiogram (ECG) signals. It has been shown that following reconstruction, most of the changes between the original and reconstructed signals are distributed in the Q, R, and S waves (QRS) region. Furthermore, any increase in the compression ratio tends to increase the magnitude of the change. This paper presents a novel approach integrating the near-precise compressed (NPC) and CS algorithms. The simulation results presented notable improvements in signal-to-noise ratio (SNR) and compression ratio (CR). The efficacy of this approach was verified by fabricating a highly efficient low-cost chip using the Taiwan Semiconductor Manufacturing Company’s (TSMC) 0.18-μm Complementary Metal-Oxide-Semiconductor (CMOS) technology. The proposed core has an operating frequency of 60 MHz and gate counts of 2.69 K. PMID:28991216
Decoding of DBEC-TBED Reed-Solomon codes. [Double-Byte-Error-Correcting, Triple-Byte-Error-Detecting

NASA Technical Reports Server (NTRS)

Deng, Robert H.; Costello, Daniel J., Jr.

1987-01-01

A problem in designing semiconductor memories is to provide some measure of error control without requiring excessive coding overhead or decoding time. In LSI and VLSI technology, memories are often organized on a multiple bit (or byte) per chip basis. For example, some 256 K bit DRAM's are organized in 32 K x 8 bit-bytes. Byte-oriented codes such as Reed-Solomon (RS) codes can provide efficient low overhead error control for such memories. However, the standard iterative algorithm for decoding RS codes is too slow for these applications. The paper presents a special decoding technique for double-byte-error-correcting, triple-byte-error-detecting RS codes which is capable of high-speed operation. This technique is designed to find the error locations and the error values directly from the syndrome without having to use the iterative algorithm to find the error locator polynomial.

Demonstration of a real-time implementation of the ICVision holographic stereogram display

NASA Astrophysics Data System (ADS)

Kulick, Jeffrey H.; Jones, Michael W.; Nordin, Gregory P.; Lindquist, Robert G.; Kowel, Stephen T.; Thomsen, Axel

1995-07-01

There is increasing interest in real-time autostereoscopic 3D displays. Such systems allow 3D objects or scenes to be viewed by one or more observers with correct motion parallax without the need for glasses or other viewing aids. Potential applications of such systems include mechanical design, training and simulation, medical imaging, virtual reality, and architectural design. One approach to the development of real-time autostereoscopic display systems has been to develop real-time holographic display systems. The approach taken by most of the systems is to compute and display a number of holographic lines at one time, and then use a scanning system to replicate the images throughout the display region. The approach taken in the ICVision system being developed at the University of Alabama in Huntsville is very different. In the ICVision display, a set of discrete viewing regions called virtual viewing slits are created by the display. Each pixel is required fill every viewing slit with different image data. When the images presented in two virtual viewing slits separated by an interoccular distance are filled with stereoscopic pair images, the observer sees a 3D image. The images are computed so that a different stereo pair is presented each time the viewer moves 1 eye pupil diameter (approximately mm), thus providing a series of stereo views. Each pixel is subdivided into smaller regions, called partial pixels. Each partial pixel is filled with a diffraction grating that is just that required to fill an individual virtual viewing slit. The sum of all the partial pixels in a pixel then fill all the virtual viewing slits. The final version of the ICVision system will form diffraction gratings in a liquid crystal layer on the surface of VLSI chips in real time. Processors embedded in the VLSI chips will compute the display in real- time. In the current version of the system, a commercial AMLCD is sandwiched with a diffraction grating array. This paper will discuss the design details of a protable 3D display based on the integration of a diffractive optical element with a commercial off-the-shelf AMLCD. The diffractive optic contains several hundred thousand partial-pixel gratings and the AMLCD modulates the light diffracted by the gratings.
A smart-pixel holographic competitive learning network

NASA Astrophysics Data System (ADS)

Slagle, Timothy Michael

Neural networks are adaptive classifiers which modify their decision boundaries based on feedback from externally- or internally-generated error signals. Optics is an attractive technology for neural network implementation because it offers the possibility of parallel, nearly instantaneous computation of the weighted neuron inputs by the propagation of light through the optical system. Using current optical device technology, system performance levels of 3 × 1011 connection updates per second can be achieved. This thesis presents an architecture for an optical competitive learning network which offers advantages over previous optical implementations, including smart-pixel-based optical neurons, phase- conjugate self-alignment of a single neuron plane, and high-density, parallel-access weight storage, interconnection, and learning in a volume hologram. The competitive learning algorithm with modifications for optical implementation is described, and algorithm simulations are performed for an example problem. The optical competitive learning architecture is then introduced. The optical system is simulated using the ``beamprop'' algorithm at the level of light propagating through the system components, and results showing competitive learning operation in agreement with the algorithm simulations are presented. The optical competitive learning requires a non-linear, non-local ``winner-take-all'' (WTA) neuron function. Custom-designed smart-pixel WTA neuron arrays were fabricated using CMOS VLSI/liquid crystal technology. Results of laboratory tests of the WTA arrays' switching characteristics, time response, and uniformity are then presented. The system uses a phase-conjugate mirror to write the self-aligning interconnection weight holograms, and energy gain is required from the reflection to minimize erasure of the existing weights. An experimental system for characterizing the PCM response is described. Useful gains of 20 were obtained with a polarization-multiplexed PCM readout, and gains of up to 60 were observed when a time-sequential read-out technique was used. Finally, the optical competitive learning laboratory system is described, including some necessary modifications to the previous architectures, and the data acquisition and control system developed for the system. Experimental results showing phase conjugation of the WTA outputs, holographic interconnect storage, associative storage between input images and WTA neuron outputs, and WTA array switching are presented, demonstrating the functions necessary for the operation of the optical learning system.
Zinc oxide integrated area efficient high output low power wavy channel thin film transistor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hanna, A. N.; Ghoneim, M. T.; Bahabry, R. R.

2013-11-25

We report an atomic layer deposition based zinc oxide channel material integrated thin film transistor using wavy channel architecture allowing expansion of the transistor width in the vertical direction using the fin type features. The experimental devices show area efficiency, higher normalized output current, and relatively lower power consumption compared to the planar architecture. This performance gain is attributed to the increased device width and an enhanced applied electric field due to the architecture when compared to a back gated planar device with the same process conditions.
Human and information

NASA Astrophysics Data System (ADS)

Mizuno, Hiroyuki

This is a lecture at the 15th anniversary of JICST Chugoku Branch Office. A recent progress of VLSI technologies will make possible to simulate some functions of human beings, the results of which will prepare next new innovation for a coming now century.
Smart vision chips: An overview

NASA Technical Reports Server (NTRS)

Koch, Christof

1994-01-01

This viewgraph presentation presents four working analog VLSI vision chips: (1) time-derivative retina, (2) zero-crossing chip, (3) resistive fuse, and (4) figure-ground chip; work in progress on computing motion and neuromorphic systems; and conceptual and practical lessons learned.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

NASA Astrophysics Data System (ADS)

Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

2007-04-01

In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
A Survey of Architectural Techniques For Improving Cache Power Efficiency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

Modern processors are using increasingly larger sized on-chip caches. Also, with each CMOS technology generation, there has been a significant increase in their leakage energy consumption. For this reason, cache power management has become a crucial research issue in modern processor design. To address this challenge and also meet the goals of sustainable computing, researchers have proposed several techniques for improving energy efficiency of cache architectures. This paper surveys recent architectural techniques for improving cache power efficiency and also presents a classification of these techniques based on their characteristics. For providing an application perspective, this paper also reviews several real-worldmore » processor chips that employ cache energy saving techniques. The aim of this survey is to enable engineers and researchers to get insights into the techniques for improving cache power efficiency and motivate them to invent novel solutions for enabling low-power operation of caches.« less
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms

PubMed Central

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2017-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Comparing architectural solutions of IPT application SDKs utilizing H.323 and SIP

NASA Astrophysics Data System (ADS)

Keskinarkaus, Anja; Korhonen, Jani; Ohtonen, Timo; Kilpelanaho, Vesa; Koskinen, Esa; Sauvola, Jaakko J.

2001-07-01

This paper presents two approaches to efficient service development for Internet Telephony. In first approach we consider services ranging from core call signaling features and media control as stated in ITU-T's H.323 to end user services that supports user interaction. The second approach supports IETF's SIP protocol. We compare these from differing architectural perspectives, economy of network and terminal development, and propose efficient architecture models for both protocols. In their design, the main criteria were component independence, lightweight operation and portability in heterogeneous end-to-end environments. In proposed architecture, the vertical division of call signaling and streaming media control logic allows for using the components either individually or combined, depending on the level of functionality required by an application.
A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

NASA Astrophysics Data System (ADS)

Urfianto, Mohammad Zalfany; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki

This paper presentss a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.
Nano-photonic light trapping near the Lambertian limit in organic solar cell architectures.

PubMed

Biswas, Rana; Timmons, Erik

2013-09-09

A critical step to achieving higher efficiency solar cells is the broad band harvesting of solar photons. Although considerable progress has recently been achieved in improving the power conversion efficiency of organic solar cells, these cells still do not absorb upto ~50% of the solar spectrum. We have designed and developed an organic solar cell architecture that can boost the absorption of photons by 40% and the photo-current by 50% for organic P3HT-PCBM absorber layers of typical device thicknesses. Our solar cell architecture is based on all layers of the solar cell being patterned in a conformal two-dimensionally periodic photonic crystal architecture. This results in very strong diffraction of photons- that increases the photon path length in the absorber layer, and plasmonic light concentration near the patterned organic-metal cathode interface. The absorption approaches the Lambertian limit. The simulations utilize a rigorous scattering matrix approach and provide bounds of the fundamental limits of nano-photonic light absorption in periodically textured organic solar cells. This solar cell architecture has the potential to increase the power conversion efficiency to 10% for single band gap organic solar cells utilizing long-wavelength absorbers.
Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays

NASA Astrophysics Data System (ADS)

Yang, Chen; Liu, LeiBo; Yin, ShouYi; Wei, ShaoJun

2014-12-01

The computational capability of a coarse-grained reconfigurable array (CGRA) can be significantly restrained due to data and context memory bandwidth bottlenecks. Traditionally, two methods have been used to resolve this problem. One method loads the context into the CGRA at run time. This method occupies very small on-chip memory but induces very large latency, which leads to low computational efficiency. The other method adopts a multi-context structure. This method loads the context into the on-chip context memory at the boot phase. Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis. The size of the context memory induces a large area overhead in multi-context structures, which results in major restrictions on application complexity. This paper proposes a Predictable Context Cache (PCC) architecture to address the above context issues by buffering the context inside a CGRA. In this architecture, context is dynamically transferred into the CGRA. Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory. Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue. Rather than fundamentally reducing the amount of input data, the transferred data and computations are processed in parallel. However, the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases. This paper also presents a Hierarchical Data Memory (HDM) architecture as a solution to the efficiency problem. In this architecture, high internal bandwidth is provided to buffer both reused input data and intermediate data. The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved. As a result of using PCC and HDM, experiments running mainstream video decoding programs achieved performance improvements of 13.57%-19.48% when there was a reasonable memory size. Therefore, 1080p@35.7fps for H.264 high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency. Further, the size of the on-chip context memory no longer restricted complex applications, which were efficiently executed on the PCC and HDM architecture.
On the Properties and Design of Organic Light-Emitting Devices

NASA Astrophysics Data System (ADS)

Erickson, Nicholas C.

Organic light-emitting devices (OLEDs) are attractive for use in next-generation display and lighting technologies. In display applications, OLEDs offer a wide emission color gamut, compatibility with flexible substrates, and high power efficiencies. In lighting applications, OLEDs offer attractive features such as broadband emission, high-performance, and potential compatibility with low-cost manufacturing methods. Despite recent demonstrations of near unity internal quantum efficiencies (photons out per electron in), OLED adoption lags conventional technologies, particularly in large-area displays and general lighting applications. This thesis seeks to understand the optical and electronic properties of OLED materials and device architectures which lead to not only high peak efficiency, but also reduced device complexity, high efficiency under high excitation, and optimal white-light emission. This is accomplished through the careful manipulation of organic thin film compositions fabricated via vacuum thermal evaporation, and the introduction of a novel device architecture, the graded-emissive layer (G-EML). This device architecture offers a unique platform to study the electronic properties of varying compositions of organic semiconductors and the resulting device performance. This thesis also introduces an experimental technique to measure the spatial overlap of electrons and holes within an OLED's emissive layer. This overlap is an important parameter which is affected by the choice of materials and device design, and greatly impacts the operation of the OLED at high excitation densities. Using the G-EML device architecture, OLEDs with improved efficiency characteristics are demonstrated, achieving simultaneously high brightness and high efficiency.
Fault-Sensitivity and Wear-Out Analysis of VLSI Systems.

DTIC Science & Technology

1995-06-01

DESCRIPTION MIXED-MODE HIERARCIAIFAULT DESCRIPTION FAULT SIMULATION TYPE OF FAULT TRANSIENT/STUCK-AT LOCATION/TIME * _AUTOMATIC FAULT INJECTION TRACE...4219-4224, December 1985. [15] J. Sosnowski, "Evaluation of transient hazards in microprocessor controll - ers," Digest, FTCS-16, The Sixteenth
Circuit Recognition of VLSI Layouts

DTIC Science & Technology

1989-09-01

from the ** ** input file contain information on each transitor . ** totaltransistors=O; while(((strcmp(buffer. "n")))=O) 1Ms(trcmp(buffer.tp"))-=O)) I... statistics and information on transistors ** ** inverters and passgates prior to entering level2 recognition.** fprintf (fo. "no more transistors.\
Pioneering University/Industry Venture Explores VLSI Frontiers.

ERIC Educational Resources Information Center

Davis, Dwight B.

1983-01-01

Discusses industry-sponsored programs in semiconductor research, focusing on Stanford University's Center for Integrated Systems (CIS). CIS, while pursuing research in semiconductor very-large-scale integration, is merging the fields of computer science, information science, and physical science. Issues related to these university/industry…
Framework for Architecture Trade Study Using MBSE and Performance Simulation

NASA Technical Reports Server (NTRS)

Ryan, Jessica; Sarkani, Shahram; Mazzuchim, Thomas

2012-01-01

Increasing complexity in modern systems as well as cost and schedule constraints require a new paradigm of system engineering to fulfill stakeholder needs. Challenges facing efficient trade studies include poor tool interoperability, lack of simulation coordination (design parameters) and requirements flowdown. A recent trend toward Model Based System Engineering (MBSE) includes flexible architecture definition, program documentation, requirements traceability and system engineering reuse. As a new domain MBSE still lacks governing standards and commonly accepted frameworks. This paper proposes a framework for efficient architecture definition using MBSE in conjunction with Domain Specific simulation to evaluate trade studies. A general framework is provided followed with a specific example including a method for designing a trade study, defining candidate architectures, planning simulations to fulfill requirements and finally a weighted decision analysis to optimize system objectives.
Layout pattern analysis using the Voronoi diagram of line segments

NASA Astrophysics Data System (ADS)

Dey, Sandeep Kumar; Cheilaris, Panagiotis; Gabrani, Maria; Papadopoulou, Evanthia

2016-01-01

Early identification of problematic patterns in very large scale integration (VLSI) designs is of great value as the lithographic simulation tools face significant timing challenges. To reduce the processing time, such a tool selects only a fraction of possible patterns which have a probable area of failure, with the risk of missing some problematic patterns. We introduce a fast method to automatically extract patterns based on their structure and context, using the Voronoi diagram of line-segments as derived from the edges of VLSI design shapes. Designers put line segments around the problematic locations in patterns called "gauges," along which the critical distance is measured. The gauge center is the midpoint of a gauge. We first use the Voronoi diagram of VLSI shapes to identify possible problematic locations, represented as gauge centers. Then we use the derived locations to extract windows containing the problematic patterns from the design layout. The problematic locations are prioritized by the shape and proximity information of the design polygons. We perform experiments for pattern selection in a portion of a 22-nm random logic design layout. The design layout had 38,584 design polygons (consisting of 199,946 line segments) on layer Mx, and 7079 markers generated by an optical rule checker (ORC) tool. The optical rules specify requirements for printing circuits with minimum dimension. Markers are the locations of some optical rule violations in the layout. We verify our approach by comparing the coverage of our extracted patterns to the ORC-generated markers. We further derive a similarity measure between patterns and between layouts. The similarity measure helps to identify a set of representative gauges that reduces the number of patterns for analysis.
PIMS: Memristor-Based Processing-in-Memory-and-Storage.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cook, Jeanine

Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energymore » efficiency for high performance scientific, data analytics, and neuromorphic applications. We also originally planned to implement a von Neumann architecture with arithmetic/logic units (ALUs) that matched the power consumption of an advanced storage array to maximize energy efficiency. Implementing this architecture in storage was our original idea, since by augmenting storage (in- stead of memory), the system could address both in-memory computation and applications that accessed larger data sets directly from storage, hence Processing-in-Memory-and-Storage (PIMS). However, as our research matured, we discovered several things that changed our original direc- tion, the most important being that a PIM that implements a standard von Neumann-type archi- tecture results in significant energy efficiency improvement, but only about a O(10) performance improvement. In addition to this, the emergence of new memory technologies moved us to propos- ing a non-von Neumann architecture, called Superstrider, implemented not in storage, but in a new DRAM technology called High Bandwidth Memory (HBM). HBM is a stacked DRAM tech- nology that includes a logic layer where an architecture such as Superstrider could potentially be implemented.« less
Hardware Algorithm Implementation for Mission Specific Processing

DTIC Science & Technology

2008-03-01

knowledge about the VLSI technology and understands VHDL, scripting, and intergrating the script in Cadencersoftware pro- gram or Modelsimr. The main...possible to have a trade off between parallel and serial logic design for the circuit. Power can be saved by using parallization, pipelining, or a

Stroboscopic Imaging Interferometer for MEMS Performance Measurement

DTIC Science & Technology

2007-07-15

Optical Iocusing L.aser Fiber Optics I) c 0 Mim er Collimator - C d Microcope lcam. indo Cold Objcclive Splitte FingerCCD "Mount irnro MEMS PicL zStack...Electronics and Photonics Laboratory: Microelectronics, VLSI reliability, failure analysis, solid-state device physics, compound semiconductors
The Xpress Transfer Protocol (XTP): A tutorial (expanded version)

NASA Technical Reports Server (NTRS)

Sanders, Robert M.; Weaver, Alfred C.

1990-01-01

The Xpress Transfer Protocol (XTP) is a reliable, real-time, light weight transfer layer protocol. Current transport layer protocols such as DoD's Transmission Control Protocol (TCP) and ISO's Transport Protocol (TP) were not designed for the next generation of high speed, interconnected reliable networks such as fiber distributed data interface (FDDI) and the gigabit/second wide area networks. Unlike all previous transport layer protocols, XTP is being designed to be implemented in hardware as a VLSI chip set. By streamlining the protocol, combining the transport and network layers and utilizing the increased speed and parallelization possible with a VLSI implementation, XTP will be able to provide the end-to-end data transmission rates demanded in high speed networks without compromising reliability and functionality. This paper describes the operation of the XTP protocol and in particular, its error, flow and rate control; inter-networking addressing mechanisms; and multicast support features, as defined in the XTP Protocol Definition Revision 3.4.
Mixed-Dimensionality VLSI-Type Configurable Tools for Virtual Prototyping of Biomicrofluidic Devices and Integrated Systems

NASA Astrophysics Data System (ADS)

Makhijani, Vinod B.; Przekwas, Andrzej J.

2002-10-01

This report presents results of a DARPA/MTO Composite CAD Project aimed to develop a comprehensive microsystem CAD environment, CFD-ACE+ Multiphysics, for bio and microfluidic devices and complete microsystems. The project began in July 1998, and was a three-year team effort between CFD Research Corporation, California Institute of Technology (CalTech), University of California, Berkeley (UCB), and Tanner Research, with Mr. Don Verlee from Abbott Labs participating as a consultant on the project. The overall objective of this project was to develop, validate and demonstrate several applications of a user-configurable VLSI-type mixed-dimensionality software tool for design of biomicrofluidics devices and integrated systems. The developed tool would provide high fidelity 3-D multiphysics modeling capability, l-D fluidic circuits modeling, and SPICE interface for system level simulations, and mixed-dimensionality design. It would combine tools for layouts and process fabrication, geometric modeling, and automated grid generation, and interfaces to EDA tools (e.g. Cadence) and MCAD tools (e.g. ProE).
Small fan-in is beautiful

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beiu, V.; Makaruk, H.E.

1997-09-01

The starting points of this paper are two size-optimal solutions: (1) one for implementing arbitrary Boolean functions; and (2) another one for implementing certain subclasses of Boolean functions. Because VLSI implementations do not cope well with highly interconnected nets -- the area of a chip grows with the cube of the fan-in -- this paper will analyze the influence of limited fan-in on the size optimality for the two solutions mentioned. First, the authors will extend a result from Horne and Hush valid for fan-in {Delta} = 2 to arbitrary fan-in. Second, they will prove that size-optimal solutions are obtainedmore » for small constant fan-ins for both constructions, while relative minimum size solutions can be obtained for fan-ins strictly lower that linear. These results are in agreement with similar ones proving that for small constant fan-ins ({Delta} = 6...9) there exist VLSI-optimal (i.e., minimizing AT{sup 2}) solutions, while there are similar small constants relating to the capacity of processing information.« less
Design and Implementation of a New Real-Time Frequency Sensor Used as Hardware Countermeasure

PubMed Central

Jiménez-Naharro, Raúl; Gómez-Galán, Juan Antonio; Sánchez-Raya, Manuel; Gómez-Bravo, Fernando; Pedro-Carrasco, Manuel

2013-01-01

A new digital countermeasure against attacks related to the clock frequency is –presented. This countermeasure, known as frequency sensor, consists of a local oscillator, a transition detector, a measurement element and an output block. The countermeasure has been designed using a full-custom technique implemented in an Application-Specific Integrated Circuit (ASIC), and the implementation has been verified and characterized with an integrated design using a 0.35 μm standard Complementary Metal Oxide Semiconductor (CMOS) technology (Very Large Scale Implementation—VLSI implementation). The proposed solution is configurable in resolution time and allowed range of period, achieving a minimum resolution time of only 1.91 ns and an initialization time of 5.84 ns. The proposed VLSI implementation shows better results than other solutions, such as digital ones based on semi-custom techniques and analog ones based on band pass filters, all design parameters considered. Finally, a counter has been used to verify the good performance of the countermeasure in avoiding the success of an attack. PMID:24008285
Chip level modeling of LSI devices

NASA Technical Reports Server (NTRS)

Armstrong, J. R.

1984-01-01

The advent of Very Large Scale Integration (VLSI) technology has rendered the gate level model impractical for many simulation activities critical to the design automation process. As an alternative, an approach to the modeling of VLSI devices at the chip level is described, including the specification of modeling language constructs important to the modeling process. A model structure is presented in which models of the LSI devices are constructed as single entities. The modeling structure is two layered. The functional layer in this structure is used to model the input/output response of the LSI chip. A second layer, the fault mapping layer, is added, if fault simulations are required, in order to map the effects of hardware faults onto the functional layer. Modeling examples for each layer are presented. Fault modeling at the chip level is described. Approaches to realistic functional fault selection and defining fault coverage for functional faults are given. Application of the modeling techniques to single chip and bit slice microprocessors is discussed.
Routing channels in VLSI layout

NASA Astrophysics Data System (ADS)

Cai, Hong

A number of algorithms for the automatic routing of interconnections in Very Large Scale Integration (VLSI) building-block layouts are presented. Algorithms for the topological definition of channels, the global routing and the geometrical definition of channels are presented. In contrast to traditional approaches the definition and ordering of the channels is done after the global routing. This approach has the advantage that global routing information can be taken into account to select the optimal channel structure. A polynomial algorithm for the channel definition and ordering problem is presented. The existence of a conflict-free channel structure is guaranteed by enforcing a sliceable placement. Algorithms for finding the shortest connection path are described. A separate algorithm is developed for the power net routing, because the two power nets must be planarly routed with variable wire width. An integrated placement and routing system for generating building-block layout is briefly described. Some experimental results and design experiences in using the system are also presented. Very good results are obtained.
Analysis and Optimization of Four-Coil Planar Magnetically Coupled Printed Spiral Resonators.

PubMed

Khan, Sadeque Reza; Choi, GoangSeog

2016-08-03

High-efficiency power transfer at a long distance can be efficiently established using resonance-based wireless techniques. In contrast to the conventional two-coil-based inductive links, this paper presents a magnetically coupled fully planar four-coil printed spiral resonator-based wireless power-transfer system that compensates the adverse effect of low coupling and improves efficiency by using high quality-factor coils. A conformal architecture is adopted to reduce the transmitter and receiver sizes. Both square architecture and circular architectures are analyzed and optimized to provide maximum efficiency at a certain operating distance. Furthermore, their performance is compared on the basis of the power-transfer efficiency and power delivered to the load. Square resonators can produce higher measured power-transfer efficiency (79.8%) than circular resonators (78.43%) when the distance between the transmitter and receiver coils is 10 mm of air medium at a resonant frequency of 13.56 MHz. On the other hand, circular coils can deliver higher power (443.5 mW) to the load than the square coils (396 mW) under the same medium properties. The performance of the proposed structures is investigated by simulation using a three-layer human-tissue medium and by experimentation.
Evaluation of the charge transfer efficiency of organic thin-film photovoltaic devices fabricated using a photoprecursor approach.

PubMed

Masuo, Sadahiro; Sato, Wataru; Yamaguchi, Yuji; Suzuki, Mitsuharu; Nakayama, Ken-ichi; Yamada, Hiroko

2015-05-01

Recently, a unique 'photoprecursor approach' was reported as a new option to fabricate a p-i-n triple-layer organic photovoltaic device (OPV) through solution processes. By fabricating the p-i-n architecture using two kinds of photoprecursors and a [6,6]-phenyl C71 butyric acid methyl ester (PC71BM) as the donor and the acceptor, the p-i-n OPVs afforded a higher photovoltaic efficiency than the corresponding p-n devices and i-devices, while the photovoltaic efficiency of p-i-n OPVs depended on the photoprecursors. In this work, the charge transfer efficiency of the i-devices composed of the photoprecursors and PC71BM was investigated using high-sensitivity fluorescence microspectroscopy combined with a time-correlated single photon counting technique to elucidate the photovoltaic efficiency depending on the photoprecursors and the effects of the p-i-n architecture. The spatially resolved fluorescence images and fluorescence lifetime measurements clearly indicated that the compatibility of the photoprecursors with PC71BM influences the charge transfer and the photovoltaic efficiencies. Although the charge transfer efficiency of the i-device was quite high, the photovoltaic efficiency of the i-device was much lower than that of the p-i-n device. These results imply that the carrier generation and carrier transportation efficiencies can be increased by fabricating the p-i-n architecture.
Examining the volume efficiency of the cortical architecture in a multi-processor network model.

PubMed

Ruppin, E; Schwartz, E L; Yeshurun, Y

1993-01-01

The convoluted form of the sheet-like mammalian cortex naturally raises the question whether there is a simple geometrical reason for the prevalence of cortical architecture in the brains of higher vertebrates. Addressing this question, we present a formal analysis of the volume occupied by a massively connected network or processors (neurons) and then consider the pertaining cortical data. Three gross macroscopic features of cortical organization are examined: the segregation of white and gray matter, the circumferential organization of the gray matter around the white matter, and the folded cortical structure. Our results testify to the efficiency of cortical architecture.
FPGA implementation of bit controller in double-tick architecture

NASA Astrophysics Data System (ADS)

Kobylecki, Michał; Kania, Dariusz

2017-11-01

This paper presents a comparison of the two original architectures of programmable bit controllers built on FPGAs. Programmable Logic Controllers (which include, among other things programmable bit controllers) built on FPGAs provide a efficient alternative to the controllers based on microprocessors which are expensive and often too slow. The presented and compared methods allow for the efficient implementation of any bit control algorithm written in Ladder Diagram language into the programmable logic system in accordance with IEC61131-3. In both cases, we have compared the effect of the applied architecture on the performance of executing the same bit control program in relation to its own size.
Design and Verification of Remote Sensing Image Data Center Storage Architecture Based on Hadoop

NASA Astrophysics Data System (ADS)

Tang, D.; Zhou, X.; Jing, Y.; Cong, W.; Li, C.

2018-04-01

The data center is a new concept of data processing and application proposed in recent years. It is a new method of processing technologies based on data, parallel computing, and compatibility with different hardware clusters. While optimizing the data storage management structure, it fully utilizes cluster resource computing nodes and improves the efficiency of data parallel application. This paper used mature Hadoop technology to build a large-scale distributed image management architecture for remote sensing imagery. Using MapReduce parallel processing technology, it called many computing nodes to process image storage blocks and pyramids in the background to improve the efficiency of image reading and application and sovled the need for concurrent multi-user high-speed access to remotely sensed data. It verified the rationality, reliability and superiority of the system design by testing the storage efficiency of different image data and multi-users and analyzing the distributed storage architecture to improve the application efficiency of remote sensing images through building an actual Hadoop service system.
Pyramidal neurovision architecture for vision machines

NASA Astrophysics Data System (ADS)

Gupta, Madan M.; Knopf, George K.

1993-08-01

The vision system employed by an intelligent robot must be active; active in the sense that it must be capable of selectively acquiring the minimal amount of relevant information for a given task. An efficient active vision system architecture that is based loosely upon the parallel-hierarchical (pyramidal) structure of the biological visual pathway is presented in this paper. Although the computational architecture of the proposed pyramidal neuro-vision system is far less sophisticated than the architecture of the biological visual pathway, it does retain some essential features such as the converging multilayered structure of its biological counterpart. In terms of visual information processing, the neuro-vision system is constructed from a hierarchy of several interactive computational levels, whereupon each level contains one or more nonlinear parallel processors. Computationally efficient vision machines can be developed by utilizing both the parallel and serial information processing techniques within the pyramidal computing architecture. A computer simulation of a pyramidal vision system for active scene surveillance is presented.
Computer Architecture for Energy Efficient SFQ

DTIC Science & Technology

2014-08-27

IBM Corporation (T.J. Watson Research Laboratory) 1101 Kitchawan Road Yorktown Heights, NY 10598 -0000 2 ABSTRACT Number of Papers published in peer...accomplished during this ARO-sponsored project at IBM Research to identify and model an energy efficient SFQ-based computer architecture. The... IBM Windsor Blue (WB), illustrated schematically in Figure 2. The basic building block of WB is a "tile" comprised of a 64-bit arithmetic logic unit
Comments on `Area and power efficient DCT architecture for image compression' by Dhandapani and Ramachandran

NASA Astrophysics Data System (ADS)

Cintra, Renato J.; Bayer, Fábio M.

2017-12-01

In [Dhandapani and Ramachandran, "Area and power efficient DCT architecture for image compression", EURASIP Journal on Advances in Signal Processing 2014, 2014:180] the authors claim to have introduced an approximation for the discrete cosine transform capable of outperforming several well-known approximations in literature in terms of additive complexity. We could not verify the above results and we offer corrections for their work.
Automated Discovery of Machine-Specific Code Improvements

DTIC Science & Technology

1984-12-01

operation of the source language. Additional analysis may reveal special features of the target architecture that may be exploited to generate efficient...Additional analysis may reveal special features of the target architecture that may be exploited to generate efficient code. Such analysis is optional...incorporate knowledge of the source language, but do not refer to features of the target machine. These early phases are sometimes referred to as the
A synchronized computational architecture for generalized bilateral control of robot arms

NASA Technical Reports Server (NTRS)

Bejczy, Antal K.; Szakaly, Zoltan

1987-01-01

This paper describes a computational architecture for an interconnected high speed distributed computing system for generalized bilateral control of robot arms. The key method of the architecture is the use of fully synchronized, interrupt driven software. Since an objective of the development is to utilize the processing resources efficiently, the synchronization is done in the hardware level to reduce system software overhead. The architecture also achieves a balaced load on the communication channel. The paper also describes some architectural relations to trading or sharing manual and automatic control.
Parallel Ada benchmarks for the SVMS

NASA Technical Reports Server (NTRS)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TRIANGULATED SURFACES*

PubMed Central

Fu, Zhisong; Jeong, Won-Ki; Pan, Yongsheng; Kirby, Robert M.; Whitaker, Ross T.

2012-01-01

This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. PMID:22641200
A new eddy current model for magnetic bearing control system design

NASA Technical Reports Server (NTRS)

Feeley, Joseph J.; Ahlstrom, Daniel J.

1992-01-01

This paper describes a new VLSI-based controller for the implementation of a Linear-Quadratic-Gaussian (LQG) theory-based control system. Use of the controller is demonstrated by design of a controller for a magnetic bearing and its performance is evaluated by computer simulation.

A high throughput architecture for a low complexity soft-output demapping algorithm

NASA Astrophysics Data System (ADS)

Ali, I.; Wasenmüller, U.; Wehn, N.

2015-11-01

Iterative channel decoders such as Turbo-Code and LDPC decoders show exceptional performance and therefore they are a part of many wireless communication receivers nowadays. These decoders require a soft input, i.e., the logarithmic likelihood ratio (LLR) of the received bits with a typical quantization of 4 to 6 bits. For computing the LLR values from a received complex symbol, a soft demapper is employed in the receiver. The implementation cost of traditional soft-output demapping methods is relatively large in high order modulation systems, and therefore low complexity demapping algorithms are indispensable in low power receivers. In the presence of multiple wireless communication standards where each standard defines multiple modulation schemes, there is a need to have an efficient demapper architecture covering all the flexibility requirements of these standards. Another challenge associated with hardware implementation of the demapper is to achieve a very high throughput in double iterative systems, for instance, MIMO and Code-Aided Synchronization. In this paper, we present a comprehensive communication and hardware performance evaluation of low complexity soft-output demapping algorithms to select the best algorithm for implementation. The main goal of this work is to design a high throughput, flexible, and area efficient architecture. We describe architectures to execute the investigated algorithms. We implement these architectures on a FPGA device to evaluate their hardware performance. The work has resulted in a hardware architecture based on the figured out best low complexity algorithm delivering a high throughput of 166 Msymbols/second for Gray mapped 16-QAM modulation on Virtex-5. This efficient architecture occupies only 127 slice registers, 248 slice LUTs and 2 DSP48Es.
Solid Oxide Fuel Cell APU Feasibility Study for a Long Range Commercial Aircraft Using UTC ITAPS Approach. Volume 1; Aircraft Propulsion and Subsystems Integration Evaluation

NASA Technical Reports Server (NTRS)

Srinivasan, Hari; Yamanis, Jean; Welch, Rick; Tulyani, Sonia; Hardin, Larry

2006-01-01

The objective of this contract effort was to define the functionality and evaluate the propulsion and power system benefits derived from a Solid Oxide Fuel Cell (SOFC) based Auxiliary Power Unit (APU) for a future long range commercial aircraft, and to define the technology gaps to enable such a system. The study employed technologies commensurate with Entry into Service (EIS) in 2015. United Technologies Corporation (UTC) Integrated Total Aircraft Power System (ITAPS) methodologies were used to evaluate system concepts to a conceptual level of fidelity. The technology benefits were captured as reductions of the mission fuel burn and emissions. The baseline aircraft considered was the Boeing 777-200ER airframe with more electric subsystems, Ultra Efficient Engine Technology (UEET) engines, and an advanced APU with ceramics for increased efficiency. In addition to the baseline architecture, four architectures using an SOFC system to replace the conventional APU were investigated. The mission fuel burn savings for Architecture-A, which has minimal system integration, is 0.16 percent. Architecture-B and Architecture-C employ greater system integration and obtain fuel burn benefits of 0.44 and 0.70 percent, respectively. Architecture-D represents the highest level of integration and obtains a benefit of 0.77 percent.
High density circuit technology, part 2

NASA Technical Reports Server (NTRS)

Wade, T. E.

1982-01-01

A multilevel metal interconnection system for very large scale integration (VLSI) systems utilizing polyimides as the interlayer dielectric material is described. A complete characterization of polyimide materials is given as well as experimental methods accomplished using a double level metal test pattern. A low temperature, double exposure polyimide patterning procedure is also presented.
Advanced technologies for Mission Control Centers

NASA Technical Reports Server (NTRS)

Dalton, John T.; Hughes, Peter M.

1991-01-01

Advance technologies for Mission Control Centers are presented in the form of the viewgraphs. The following subject areas are covered: technology needs; current technology efforts at GSFC (human-machine interface development, object oriented software development, expert systems, knowledge-based software engineering environments, and high performance VLSI telemetry systems); and test beds.
Princeton VLSI Project.

DTIC Science & Technology

1983-01-01

34 for these controllers: the remote words required appear On the system bus vithout having been requested, as if the controllers has ExtraSensory ... Perception .) In any case, the processor is not aware of the ESP controller (except for time delays); it operates as if it had a long bus linking it to all
VLSI Design Techniques for Floating-Point Computation

DTIC Science & Technology

1988-11-18

J. C. Gibson, The Gibson Mix, IBM Systems Development Division Tech. Report(June 1970). [Heni83] A. Heninger, The Zilog Z8070 Floating-Point...Broadcast Oock Gen. ’ itp Divide Module Module byN Module Oock Communication l I T Oock Communication Bus Figure 7.2. Clock Distribution between
A Coherent VLSI Design Environment.

DTIC Science & Technology

1985-09-30

deviation were only a few percent. If the number of paths with a delay close to 9ns were large, even more statistical accuracy would be required to...Zippel, 1Capsules, IGPLAN Bulletn, vol. 18, no. 6, waveforms. In the bottom window, the currents into the pp. 164-169, 1983. depletion transitors are
A Fast Turn-Around Facility for Very Large Scale Integration (VLSI)

DTIC Science & Technology

1982-06-01

statistics determination, the first test mask set will use the MATRIX chip design which was recently developed here at Stanford. This chip provides...reached when the basewidth is reduced to zero. Such devices, variably known as depleted- base transistors or bipolar static-induction transitors , have been
Pursuit, Avoidance, and Cohesion in Flight: Multi-Purpose Control Laws and Neuromorphic VLSI

DTIC Science & Technology

2010-10-01

34 Binaural Spectral Cues for Ultrasonic Localization," Proc. International Symposium on Circuits and Systems, pp. 2110 - 2113, 2008 (DOI:10.1109/ISCAS...T. K. Horiuchi, C. Bansal, and T. M. Massoud (2009), " Binaural Intensity Comparison in the Echolocating Bat Using Synaptic Conductance," Proc
Computer Aided Design of Integrated Circuit Fabrication Processes for VLSI Devices

DTIC Science & Technology

1980-01-01

diffusion coefficient and surface conc,,tration of the chlorine as well as any field present; X is related to the ratio ol the diffusion coefficient to...with polysilicon gat(. .ed contacts, the interaction of oxidation, segregation and diffusion in all regions of the simulation space is a critical
An Energy-Efficient and High-Quality Video Transmission Architecture in Wireless Video-Based Sensor Networks.

PubMed

Aghdasi, Hadi S; Abbaspour, Maghsoud; Moghadam, Mohsen Ebrahimi; Samei, Yasaman

2008-08-04

Technological progress in the fields of Micro Electro-Mechanical Systems (MEMS) and wireless communications and also the availability of CMOS cameras, microphones and small-scale array sensors, which may ubiquitously capture multimedia content from the field, have fostered the development of low-cost limited resources Wireless Video-based Sensor Networks (WVSN). With regards to the constraints of videobased sensor nodes and wireless sensor networks, a supporting video stream is not easy to implement with the present sensor network protocols. In this paper, a thorough architecture is presented for video transmission over WVSN called Energy-efficient and high-Quality Video transmission Architecture (EQV-Architecture). This architecture influences three layers of communication protocol stack and considers wireless video sensor nodes constraints like limited process and energy resources while video quality is preserved in the receiver side. Application, transport, and network layers are the layers in which the compression protocol, transport protocol, and routing protocol are proposed respectively, also a dropping scheme is presented in network layer. Simulation results over various environments with dissimilar conditions revealed the effectiveness of the architecture in improving the lifetime of the network as well as preserving the video quality.
Complex Processes from Dynamical Architectures with Time-Scale Hierarchy

PubMed Central

Perdikis, Dionysios; Huys, Raoul; Jirsa, Viktor

2011-01-01

The idea that complex motor, perceptual, and cognitive behaviors are composed of smaller units, which are somehow brought into a meaningful relation, permeates the biological and life sciences. However, no principled framework defining the constituent elementary processes has been developed to this date. Consequently, functional configurations (or architectures) relating elementary processes and external influences are mostly piecemeal formulations suitable to particular instances only. Here, we develop a general dynamical framework for distinct functional architectures characterized by the time-scale separation of their constituents and evaluate their efficiency. Thereto, we build on the (phase) flow of a system, which prescribes the temporal evolution of its state variables. The phase flow topology allows for the unambiguous classification of qualitatively distinct processes, which we consider to represent the functional units or modes within the dynamical architecture. Using the example of a composite movement we illustrate how different architectures can be characterized by their degree of time scale separation between the internal elements of the architecture (i.e. the functional modes) and external interventions. We reveal a tradeoff of the interactions between internal and external influences, which offers a theoretical justification for the efficient composition of complex processes out of non-trivial elementary processes or functional modes. PMID:21347363
A highly efficient 3D level-set grain growth algorithm tailored for ccNUMA architecture

NASA Astrophysics Data System (ADS)

Mießen, C.; Velinov, N.; Gottstein, G.; Barrales-Mora, L. A.

2017-12-01

A highly efficient simulation model for 2D and 3D grain growth was developed based on the level-set method. The model introduces modern computational concepts to achieve excellent performance on parallel computer architectures. Strong scalability was measured on cache-coherent non-uniform memory access (ccNUMA) architectures. To achieve this, the proposed approach considers the application of local level-set functions at the grain level. Ideal and non-ideal grain growth was simulated in 3D with the objective to study the evolution of statistical representative volume elements in polycrystals. In addition, microstructure evolution in an anisotropic magnetic material affected by an external magnetic field was simulated.
On the optimality of a universal noiseless coder

NASA Technical Reports Server (NTRS)

Yeh, Pen-Shu; Rice, Robert F.; Miller, Warner H.

1993-01-01

Rice developed a universal noiseless coding structure that provides efficient performance over an extremely broad range of source entropy. This is accomplished by adaptively selecting the best of several easily implemented variable length coding algorithms. Variations of such noiseless coders have been used in many NASA applications. Custom VLSI coder and decoder modules capable of processing over 50 million samples per second have been fabricated and tested. In this study, the first of the code options used in this module development is shown to be equivalent to a class of Huffman code under the Humblet condition, for source symbol sets having a Laplacian distribution. Except for the default option, other options are shown to be equivalent to the Huffman codes of a modified Laplacian symbol set, at specified symbol entropy values. Simulation results are obtained on actual aerial imagery over a wide entropy range, and they confirm the optimality of the scheme. Comparison with other known techniques are performed on several widely used images and the results further validate the coder's optimality.
Power efficient, clock gated multiplexer based full adder cell using 28 nm technology

NASA Astrophysics Data System (ADS)

Gupta, Ashutosh; Murgai, Shruti; Gulati, Anmol; Kumar, Pradeep

2016-03-01

Clock gating is a leading technique used for power saving. Full adders is one of the basic circuit that can be found in maximum VLSI circuits. In this paper clock gated multiplexer based full adder cell is implemented on 28 nm technology. We have designed a full adder cell using a multiplexer with a gated clock without degrading its performance of the cell. We have negative latch circuit for generating gated clock. This gated clock is used to control the multiplexer based full adder cell. The circuit has been synthesized on kintex FPGA through Xilinx ISE Design Suite 14.7 using 28 nm technology in Verilog HDL. The circuit has been simulated on Modelsim 10.3c. The design is verified using System Verilog on QuestaSim in UVM environment. The total power of the circuit has been reduced by 7.41% without degrading the performance of original circuit. The power has been calculated using XPower Analyzer tool of XILINX ISE DESIGN SUITE 14.3.
Preliminary Results from a Model-Driven Architecture Methodology for Development of an Event-Driven Space Communications Service Concept

NASA Technical Reports Server (NTRS)

Roberts, Christopher J.; Morgenstern, Robert M.; Israel, David J.; Borky, John M.; Bradley, Thomas H.

2017-01-01

NASA's next generation space communications network will involve dynamic and autonomous services analogous to services provided by current terrestrial wireless networks. This architecture concept, known as the Space Mobile Network (SMN), is enabled by several technologies now in development. A pillar of the SMN architecture is the establishment and utilization of a continuous bidirectional control plane space link channel and a new User Initiated Service (UIS) protocol to enable more dynamic and autonomous mission operations concepts, reduced user space communications planning burden, and more efficient and effective provider network resource utilization. This paper provides preliminary results from the application of model driven architecture methodology to develop UIS. Such an approach is necessary to ensure systematic investigation of several open questions concerning the efficiency, robustness, interoperability, scalability and security of the control plane space link and UIS protocol.
Business Architecture Development at Public Administration - Insights from Government EA Method Engineering Project in Finland

NASA Astrophysics Data System (ADS)

Valtonen, Katariina; Leppänen, Mauri

Governments worldwide are concerned for efficient production of services to customers. To improve quality of services and to make service production more efficient, information and communication technology (ICT) is largely exploited in public administration (PA). Succeeding in this exploitation calls for large-scale planning which embraces issues from strategic to technological level. In this planning the notion of enterprise architecture (EA) is commonly applied. One of the sub-architectures of EA is business architecture (BA). BA planning is challenging in PA due to a large number of stakeholders, a wide set of customers, and solid and hierarchical structures of organizations. To support EA planning in Finland, a project to engineer a government EA (GEA) method was launched. In this chapter, we analyze the discussions and outputs of the project workshops and reflect emerged issues on current e-government literature. We bring forth insights into and suggestions for government BA and its development.
Technology advances and market forces: Their impact on high performance architectures

NASA Technical Reports Server (NTRS)

Best, D. R.

1978-01-01

Reasonable projections into future supercomputer architectures and technology require an analysis of the computer industry market environment, the current capabilities and trends within the component industry, and the research activities on computer architecture in the industrial and academic communities. Management, programmer, architect, and user must cooperate to increase the efficiency of supercomputer development efforts. Care must be taken to match the funding, compiler, architecture and application with greater attention to testability, maintainability, reliability, and usability than supercomputer development programs of the past.
Light Extraction From Solution-Based Processable Electrophosphorescent Organic Light-Emitting Diodes

NASA Astrophysics Data System (ADS)

Krummacher, Benjamin C.; Mathai, Mathew; So, Franky; Choulis, Stelios; Choong, And-En, Vi

2007-06-01

Molecular dye dispersed solution processable blue emitting organic light-emitting devices have been fabricated and the resulting devices exhibit efficiency as high as 25 cd/A. With down-conversion phosphors, white emitting devices have been demonstrated with peak efficiency of 38 cd/A and luminous efficiency of 25 lm/W. The high efficiencies have been a product of proper tuning of carrier transport, optimization of the location of the carrier recombination zone and, hence, microcavity effect, efficient down-conversion from blue to white light, and scattering/isotropic remission due to phosphor particles. An optical model has been developed to investigate all these effects. In contrast to the common misunderstanding that light out-coupling efficiency is about 22% and independent of device architecture, our device data and optical modeling results clearly demonstrated that the light out-coupling efficiency is strongly dependent on the exact location of the recombination zone. Estimating the device internal quantum efficiencies based on external quantum efficiencies without considering the device architecture could lead to erroneous conclusions.
Complexity Optimization and High-Throughput Low-Latency Hardware Implementation of a Multi-Electrode Spike-Sorting Algorithm

PubMed Central

Dragas, Jelena; Jäckel, David; Hierlemann, Andreas; Franke, Felix

2017-01-01

Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989

Complexity optimization and high-throughput low-latency hardware implementation of a multi-electrode spike-sorting algorithm.

PubMed

Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix

2015-03-01

Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction.
Engineering interfacial photo-induced charge transfer based on nanobamboo array architecture for efficient solar-to-chemical energy conversion.

PubMed

Wang, Xiaotian; Liow, Chihao; Bisht, Ankit; Liu, Xinfeng; Sum, Tze Chien; Chen, Xiaodong; Li, Shuzhou

2015-04-01

Engineering interfacial photo-induced charge transfer for highly synergistic photocatalysis is successfully realized based on nanobamboo array architecture. Programmable assemblies of various components and heterogeneous interfaces, and, in turn, engineering of the energy band structure along the charge transport pathways, play a critical role in generating excellent synergistic effects of multiple components for promoting photocatalytic efficiency. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Physical fault tolerance of nanoelectronics.

PubMed

Szkopek, Thomas; Roychowdhury, Vwani P; Antoniadis, Dimitri A; Damoulakis, John N

2011-04-29

The error rate in complementary transistor circuits is suppressed exponentially in electron number, arising from an intrinsic physical implementation of fault-tolerant error correction. Contrariwise, explicit assembly of gates into the most efficient known fault-tolerant architecture is characterized by a subexponential suppression of error rate with electron number, and incurs significant overhead in wiring and complexity. We conclude that it is more efficient to prevent logical errors with physical fault tolerance than to correct logical errors with fault-tolerant architecture.
Design of SIP transformation server for efficient media negotiation

NASA Astrophysics Data System (ADS)

Pack, Sangheon; Paik, Eun Kyoung; Choi, Yanghee

2001-07-01

Voice over IP (VoIP) is one of the advanced services supported by the next generation mobile communication. VoIP should support various media formats and terminals existing together. This heterogeneous environment may prevent diverse users from establishing VoIP sessions among them. To solve the problem an efficient media negotiation mechanism is required. In this paper, we propose the efficient media negotiation architecture using the transformation server and the Intelligent Location Server (ILS). The transformation server is an extended Session Initiation Protocol (SIP) proxy server. It can modify an unacceptable session INVITE message into an acceptable one using the ILS. The ILS is a directory server based on the Lightweight Directory Access Protocol (LDAP) that keeps userí*s location information and available media information. The proposed architecture can eliminate an unnecessary response and re-INVITE messages of the standard SIP architecture. It takes only 1.5 round trip times to negotiate two different media types while the standard media negotiation mechanism takes 2.5 round trip times. The extra processing time in message handling is negligible in comparison to the reduced round trip time. The experimental results show that the session setup time in the proposed architecture is less than the setup time in the standard SIP. These results verify that the proposed media negotiation mechanism is more efficient in solving diversity problems.
Architecture survey summary report : final report

DOT National Transportation Integrated Search

1993-07-01

The FAA Corporate Systems Architecture (CSA) Initiative is intended to enable the FAA to enhance the efficiency and effectiveness of the use of Information Technology (IT) throughout the FAA as well as meet specific federal mandates such as the imple...
Rational Strategies for Efficient Perovskite Solar Cells.

PubMed

Seo, Jangwon; Noh, Jun Hong; Seok, Sang Il

2016-03-15

A long-standing dream in the large scale application of solar energy conversion is the fabrication of solar cells with high-efficiency and long-term stability at low cost. The realization of such practical goals depends on the architecture, process and key materials because solar cells are typically constructed from multilayer heterostructures of light harvesters, with electron and hole transporting layers as a major component. Recently, inorganic-organic hybrid lead halide perovskites have attracted significant attention as light absorbers for the fabrication of low-cost and high-efficiency solar cells via a solution process. This mainly stems from long-range ambipolar charge transport properties, low exciton binding energies, and suitable band gap tuning by managing the chemical composition. In our pioneering work, a new photovoltaic platform for efficient perovskite solar cells (PSCs) was proposed, which yielded a high power conversion efficiency (PCE) of 12%. The platform consisted of a pillared architecture of a three-dimensional nanocomposite of perovskites fully infiltrating mesoporous TiO2, resulting in the formation of continuous phases and perovskite domains overlaid with a polymeric hole conductor. Since then, the PCE of our PSCs has been rapidly increased from 3% to over 20% certified efficiency. The unprecedented increase in the PCE can be attributed to the effective integration of the advantageous attributes of the refined bicontinuous architecture, deposition process, and composition of perovskite materials. Specifically, the bicontinuous architectures used in the high efficiency comprise a layer of perovskite sandwiched between mesoporous metal-oxide layer, which is a very thinner than that of used in conventional dye-sensitized solar cells, and hole-conducting contact materials with a metal back contact. The mesoporous scaffold can affect the hysteresis under different scan direction in measurements of PSCs. The hysteresis also greatly depends on the cell architecture and perovskite composition. In this Account, we will describe what we do with major aspects including (1) the film morphology through the development of intermediate chemistry retarding the rapid reaction between methylammonium or formamidinium iodide and lead halide (PbI2) for improved perovskite film formation; (2) the phase stability and band gap tuning of the perovskite layer through the materials engineering; (3) the development of electron and hole transporting materials for carrier-selective contacting layers; and (4) the adoption of p-i-n and n-i-p architectures depending on the position of the electron or hole conducting layer in front of incident light. Finally, we will summarize the recent incredible achievements in PSCs, and finally provide challenges facing the future development and commercialization of PSCs.
Integration of utilities infrastructures in a future internet enabled smart city framework.

PubMed

Sánchez, Luis; Elicegui, Ignacio; Cuesta, Javier; Muñoz, Luis; Lanza, Jorge

2013-10-25

Improving efficiency of city services and facilitating a more sustainable development of cities are the main drivers of the smart city concept. Information and Communication Technologies (ICT) play a crucial role in making cities smarter, more accessible and more open. In this paper we present a novel architecture exploiting major concepts from the Future Internet (FI) paradigm addressing the challenges that need to be overcome when creating smarter cities. This architecture takes advantage of both the critical communications infrastructures already in place and owned by the utilities as well as of the infrastructure belonging to the city municipalities to accelerate efficient provision of existing and new city services. The paper highlights how FI technologies create the necessary glue and logic that allows the integration of current vertical and isolated city services into a holistic solution, which enables a huge forward leap for the efficiency and sustainability of our cities. Moreover, the paper describes a real-world prototype, that instantiates the aforementioned architecture, deployed in one of the parks of the city of Santander providing an autonomous public street lighting adaptation service. This prototype is a showcase on how added-value services can be seamlessly created on top of the proposed architecture.
Multi-aperture all-fiber active coherent beam combining for free-space optical communication receivers.

PubMed

Yang, Yan; Geng, Chao; Li, Feng; Huang, Guan; Li, Xinyang

2017-10-30

Multi-aperture receiver with optical combining architecture is an effective approach to overcome the turbulent atmosphere effect on the performance of the free-space optical (FSO) communications, in which how to combine the multiple laser beams received by the sub-apertures efficiently is one of the key technologies. In this paper, we focus on the combining module based on fiber couplers, and propose the all-fiber coherent beam combining (CBC) with two architectures by using active phase locking. To validate the feasibility of the proposed combining module, corresponding experiments and simulations on the CBC of four laser beams are carried out. The experimental results show that the phase differences among the input beams can be compensated and the combining efficiency can be stably promoted by active phase locking in CBC with both of the two architectures. The simulation results show that the combining efficiency fluctuates when turbulent atmosphere is considered, and the effectiveness of the combining module decreases as the turbulence increases. We believe that the combining module proposed in this paper has great potential, and the results can provide significant advices for researchers when building such a multi-aperture receiver with optical combining architecture for FSO commutation systems.
Integration of Utilities Infrastructures in a Future Internet Enabled Smart City Framework

PubMed Central

Sánchez, Luis; Elicegui, Ignacio; Cuesta, Javier; Muñoz, Luis; Lanza, Jorge

2013-01-01

Improving efficiency of city services and facilitating a more sustainable development of cities are the main drivers of the smart city concept. Information and Communication Technologies (ICT) play a crucial role in making cities smarter, more accessible and more open. In this paper we present a novel architecture exploiting major concepts from the Future Internet (FI) paradigm addressing the challenges that need to be overcome when creating smarter cities. This architecture takes advantage of both the critical communications infrastructures already in place and owned by the utilities as well as of the infrastructure belonging to the city municipalities to accelerate efficient provision of existing and new city services. The paper highlights how FI technologies create the necessary glue and logic that allows the integration of current vertical and isolated city services into a holistic solution, which enables a huge forward leap for the efficiency and sustainability of our cities. Moreover, the paper describes a real-world prototype, that instantiates the aforementioned architecture, deployed in one of the parks of the city of Santander providing an autonomous public street lighting adaptation service. This prototype is a showcase on how added-value services can be seamlessly created on top of the proposed architecture. PMID:24233072
Improving crop nutrient efficiency through root architecture modifications.

PubMed

Li, Xinxin; Zeng, Rensen; Liao, Hong

2016-03-01

Improving crop nutrient efficiency becomes an essential consideration for environmentally friendly and sustainable agriculture. Plant growth and development is dependent on 17 essential nutrient elements, among them, nitrogen (N) and phosphorus (P) are the two most important mineral nutrients. Hence it is not surprising that low N and/or low P availability in soils severely constrains crop growth and productivity, and thereby have become high priority targets for improving nutrient efficiency in crops. Root exploration largely determines the ability of plants to acquire mineral nutrients from soils. Therefore, root architecture, the 3-dimensional configuration of the plant's root system in the soil, is of great importance for improving crop nutrient efficiency. Furthermore, the symbiotic associations between host plants and arbuscular mycorrhiza fungi/rhizobial bacteria, are additional important strategies to enhance nutrient acquisition. In this review, we summarize the recent advances in the current understanding of crop species control of root architecture alterations in response to nutrient availability and root/microbe symbioses, through gene or QTL regulation, which results in enhanced nutrient acquisition. © 2015 Institute of Botany, Chinese Academy of Sciences.
Architecture Synthesis and Reduced-Cost Architectures for Human Exploration Missions

NASA Technical Reports Server (NTRS)

Woodcock, Gordon

2004-01-01

Development of architectures for human exploration missions has been pursued in the international aerospace community for a long time. This paper attempts a different approach and way of looking at architectures. Most of the emphasis is on lunar architectures with a brief look at Mars. The first step is to set forth overarching gods in order to understand origins of requirements. Then, principles and guidelines are developed for architecture formulation. It is argued that safety and cost are the primary factors. Alternative mission profiles are examined for adherence to the principles, and specific architectures formulated according to the guidelines. The guidelines themselves indicate preferred evolution paths from lunar to Mars architectures. Results of example calculations are given to illustrate the process, and an evolution path is recommended. Safety and cost criteria tend to conflict, but it is shown that cost-efficient architectures can be enhanced for good safety ratings at modest cost.
Proposed hardware architectures of particle filter for object tracking

NASA Astrophysics Data System (ADS)

Abd El-Halym, Howida A.; Mahmoud, Imbaby Ismail; Habib, SED

2012-12-01

In this article, efficient hardware architectures for particle filter (PF) are presented. We propose three different architectures for Sequential Importance Resampling Filter (SIRF) implementation. The first architecture is a two-step sequential PF machine, where particle sampling, weight, and output calculations are carried out in parallel during the first step followed by sequential resampling in the second step. For the weight computation step, a piecewise linear function is used instead of the classical exponential function. This decreases the complexity of the architecture without degrading the results. The second architecture speeds up the resampling step via a parallel, rather than a serial, architecture. This second architecture targets a balance between hardware resources and the speed of operation. The third architecture implements the SIRF as a distributed PF composed of several processing elements and central unit. All the proposed architectures are captured using VHDL synthesized using Xilinx environment, and verified using the ModelSim simulator. Synthesis results confirmed the resource reduction and speed up advantages of our architectures.
Design and Field Experimentation of a Cooperative ITS Architecture Based on Distributed RSUs.

PubMed

Moreno, Asier; Osaba, Eneko; Onieva, Enrique; Perallos, Asier; Iovino, Giovanni; Fernández, Pablo

2016-07-22

This paper describes a new cooperative Intelligent Transportation System architecture that aims to enable collaborative sensing services. The main goal of this architecture is to improve transportation efficiency and performance. The system, which has been proven within the participation in the ICSI (Intelligent Cooperative Sensing for Improved traffic efficiency) European project, encompasses the entire process of capture and management of available road data. For this purpose, it applies a combination of cooperative services and methods for data sensing, acquisition, processing and communication amongst road users, vehicles, infrastructures and related stakeholders. Additionally, the advantages of using the proposed system are exposed. The most important of these advantages is the use of a distributed architecture, moving the system intelligence from the control centre to the peripheral devices. The global architecture of the system is presented, as well as the software design and the interaction between its main components. Finally, functional and operational results observed through the experimentation are described. This experimentation has been carried out in two real scenarios, in Lisbon (Portugal) and Pisa (Italy).
Design and Field Experimentation of a Cooperative ITS Architecture Based on Distributed RSUs †

PubMed Central

Moreno, Asier; Osaba, Eneko; Onieva, Enrique; Perallos, Asier; Iovino, Giovanni; Fernández, Pablo

2016-01-01

This paper describes a new cooperative Intelligent Transportation System architecture that aims to enable collaborative sensing services. The main goal of this architecture is to improve transportation efficiency and performance. The system, which has been proven within the participation in the ICSI (Intelligent Cooperative Sensing for Improved traffic efficiency) European project, encompasses the entire process of capture and management of available road data. For this purpose, it applies a combination of cooperative services and methods for data sensing, acquisition, processing and communication amongst road users, vehicles, infrastructures and related stakeholders. Additionally, the advantages of using the proposed system are exposed. The most important of these advantages is the use of a distributed architecture, moving the system intelligence from the control centre to the peripheral devices. The global architecture of the system is presented, as well as the software design and the interaction between its main components. Finally, functional and operational results observed through the experimentation are described. This experimentation has been carried out in two real scenarios, in Lisbon (Portugal) and Pisa (Italy). PMID:27455277
Design and Analysis of a Neuromemristive Reservoir Computing Architecture for Biosignal Processing

PubMed Central

Kudithipudi, Dhireesha; Saleh, Qutaiba; Merkel, Cory; Thesing, James; Wysocki, Bryant

2016-01-01

Reservoir computing (RC) is gaining traction in several signal processing domains, owing to its non-linear stateful computation, spatiotemporal encoding, and reduced training complexity over recurrent neural networks (RNNs). Previous studies have shown the effectiveness of software-based RCs for a wide spectrum of applications. A parallel body of work indicates that realizing RNN architectures using custom integrated circuits and reconfigurable hardware platforms yields significant improvements in power and latency. In this research, we propose a neuromemristive RC architecture, with doubly twisted toroidal structure, that is validated for biosignal processing applications. We exploit the device mismatch to implement the random weight distributions within the reservoir and propose mixed-signal subthreshold circuits for energy efficiency. A comprehensive analysis is performed to compare the efficiency of the neuromemristive RC architecture in both digital(reconfigurable) and subthreshold mixed-signal realizations. Both Electroencephalogram (EEG) and Electromyogram (EMG) biosignal benchmarks are used for validating the RC designs. The proposed RC architecture demonstrated an accuracy of 90 and 84% for epileptic seizure detection and EMG prosthetic finger control, respectively. PMID:26869876
Simulation system architecture design for generic communications link

NASA Technical Reports Server (NTRS)

Tsang, Chit-Sang; Ratliff, Jim

1986-01-01

This paper addresses a computer simulation system architecture design for generic digital communications systems. It addresses the issues of an overall system architecture in order to achieve a user-friendly, efficient, and yet easily implementable simulation system. The system block diagram and its individual functional components are described in detail. Software implementation is discussed with the VAX/VMS operating system used as a target environment.
Multiprocessor architecture: Synthesis and evaluation

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction

NASA Astrophysics Data System (ADS)

Kulkarni, Amey; Stanislaus, Jerome L.; Mohsenin, Tinoosh

2014-05-01

Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. The signal reconstruction techniques are computationally intensive and have sluggish performance, which make them impractical for real-time processing applications . The paper presents novel architectures for Orthogonal Matching Pursuit algorithm, one of the popular CS reconstruction algorithms. We show the implementation results of proposed architectures on FPGA, ASIC and on a custom many-core platform. For FPGA and ASIC implementation, a novel thresholding method is used to reduce the processing time for the optimization problem by at least 25%. Whereas, for the custom many-core platform, efficient parallelization techniques are applied, to reconstruct signals with variant signal lengths of N and sparsity of m. The algorithm is divided into three kernels. Each kernel is parallelized to reduce execution time, whereas efficient reuse of the matrix operators allows us to reduce area. Matrix operations are efficiently paralellized by taking advantage of blocked algorithms. For demonstration purpose, all architectures reconstruct a 256-length signal with maximum sparsity of 8 using 64 measurements. Implementation on Xilinx Virtex-5 FPGA, requires 27.14 μs to reconstruct the signal using basic OMP. Whereas, with thresholding method it requires 18 μs. ASIC implementation reconstructs the signal in 13 μs. However, our custom many-core, operating at 1.18 GHz, takes 18.28 μs to complete. Our results show that compared to the previous published work of the same algorithm and matrix size, proposed architectures for FPGA and ASIC implementations perform 1.3x and 1.8x respectively faster. Also, the proposed many-core implementation performs 3000x faster than the CPU and 2000x faster than the GPU.
An Airborne Onboard Parallel Processing Testbed

NASA Technical Reports Server (NTRS)

Mandl, Daniel J.

2014-01-01

This presentation provides information on the progress the Intelligent Payload Module (IPM) development effort. In addition, a vision is presented on integration of the IPM architecture with the GeoSocial Application Program Interface (API) architecture to enable efficient distribution of satellite data products.
A Prolog Emulator

NASA Technical Reports Server (NTRS)

Tick, Evan

1987-01-01

This note describes an efficient software emulator for the Warren Abstract Machine (WAM) Prolog architecture. The version of the WAM implemented is called Lcode. The Lcode emulator, written in C, executes the 'naive reverse' benchmark at 3900 LIPS. The emulator is one of a set of tools used to measure the memory-referencing characteristics and performance of Prolog programs. These tools include a compiler, assembler, and memory simulators. An overview of the Lcode architecture is given here, followed by a description and listing of the emulator code implementing each Lcode instruction. This note will be of special interest to those studying the WAM and its performance characteristics. In general, this note will be of interest to those creating efficient software emulators for abstract machine architectures.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.