Importance of balanced architectures in the design of high-performance imaging systems
NASA Astrophysics Data System (ADS)
Sgro, Joseph A.; Stanton, Paul C.
1999-03-01
Imaging systems employed in demanding military and industrial applications, such as automatic target recognition and computer vision, typically require real-time high-performance computing resources. While high- performances computing systems have traditionally relied on proprietary architectures and custom components, recent advances in high performance general-purpose microprocessor technology have produced an abundance of low cost components suitable for use in high-performance computing systems. A common pitfall in the design of high performance imaging system, particularly systems employing scalable multiprocessor architectures, is the failure to balance computational and memory bandwidth. The performance of standard cluster designs, for example, in which several processors share a common memory bus, is typically constrained by memory bandwidth. The symptom characteristic of this problem is failure to the performance of the system to scale as more processors are added. The problem becomes exacerbated if I/O and memory functions share the same bus. The recent introduction of microprocessors with large internal caches and high performance external memory interfaces makes it practical to design high performance imaging system with balanced computational and memory bandwidth. Real word examples of such designs will be presented, along with a discussion of adapting algorithm design to best utilize available memory bandwidth.
2010-07-22
dependent , providing a natural bandwidth match between compute cores and the memory subsystem. • High Bandwidth Dcnsity. Waveguides crossing the chip...simulate this memory access architecture on a 2S6-core chip with a concentrated 64-node network lIsing detailed traces of high-performance embedded...memory modulcs, wc placc memory access poi nts (MAPs) around the pcriphery of the chip connected to thc nctwork. These MAPs, shown in Figure 4, contain
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation
Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.; ...
2017-01-03
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
Performance measurements of the first RAID prototype
NASA Technical Reports Server (NTRS)
Chervenak, Ann L.
1990-01-01
The performance is examined of Redundant Arrays of Inexpensive Disks (RAID) the First, a prototype disk array. A hierarchy of bottlenecks was discovered in the system that limit overall performance. The most serious is the memory system contention on the Sun 4/280 host CPU, which limits array bandwidth to 2.3 MBytes/sec. The array performs more successfully on small random operations, achieving nearly 300 I/Os per second before the Sun 4/280 becomes CPU limited. Other bottlenecks in the system are the VME backplane, bandwidth on the disk controller, and overheads associated with the SCSI protocol. All are examined in detail. The main conclusion is that to achieve the potential bandwidth of arrays, more powerful CPU's alone will not suffice. Just as important are adequate host memory bandwidth and support for high bandwidth on disk controllers. Current disk controllers are more often designed to achieve large numbers of small random operations, rather than high bandwidth. Operating systems also need to change to support high bandwidth from disk arrays. In particular, they should transfer data in larger blocks, and should support asynchronous I/O to improve sequential write performance.
FPGA cluster for high-performance AO real-time control system
NASA Astrophysics Data System (ADS)
Geng, Deli; Goodsell, Stephen J.; Basden, Alastair G.; Dipper, Nigel A.; Myers, Richard M.; Saunter, Chris D.
2006-06-01
Whilst the high throughput and low latency requirements for the next generation AO real-time control systems have posed a significant challenge to von Neumann architecture processor systems, the Field Programmable Gate Array (FPGA) has emerged as a long term solution with high performance on throughput and excellent predictability on latency. Moreover, FPGA devices have highly capable programmable interfacing, which lead to more highly integrated system. Nevertheless, a single FPGA is still not enough: multiple FPGA devices need to be clustered to perform the required subaperture processing and the reconstruction computation. In an AO real-time control system, the memory bandwidth is often the bottleneck of the system, simply because a vast amount of supporting data, e.g. pixel calibration maps and the reconstruction matrix, need to be accessed within a short period. The cluster, as a general computing architecture, has excellent scalability in processing throughput, memory bandwidth, memory capacity, and communication bandwidth. Problems, such as task distribution, node communication, system verification, are discussed.
NASA Astrophysics Data System (ADS)
He, Huimin; Liu, Fengman; Li, Baoxia; Xue, Haiyun; Wang, Haidong; Qiu, Delong; Zhou, Yunyan; Cao, Liqiang
2016-11-01
With the development of the multicore processor, the bandwidth and capacity of the memory, rather than the memory area, are the key factors in server performance. At present, however, the new architectures, such as fully buffered DIMM (FBDIMM), hybrid memory cube (HMC), and high bandwidth memory (HBM), cannot be commercially applied in the server. Therefore, a new architecture for the server is proposed. CPU and memory are separated onto different boards, and optical interconnection is used for the communication between them. Each optical module corresponds to each dual inline memory module (DIMM) with 64 channels. Compared to the previous technology, not only can the architecture realize high-capacity and wide-bandwidth memory, it also can reduce power consumption and cost, and be compatible with the existing dynamic random access memory (DRAM). In this article, the proposed module with system-in-package (SiP) integration is demonstrated. In the optical module, the silicon photonic chip is included, which is a promising technology to be applied in the next-generation data exchanging centers. And due to the bandwidth-distance performance of the optical interconnection, SerDes chips are introduced to convert the 64-bit data at 800 Mbps from/to 4-channel data at 12.8 Gbps after/before they are transmitted though optical fiber. All the devices are packaged on cheap organic substrates. To ensure the performance of the whole system, several optimization efforts have been performed on the two modules. High-speed interconnection traces have been designed and simulated with electromagnetic simulation software. Steady-state thermal characteristics of the transceiver module have been evaluated by ANSYS APLD based on finite-element methodology (FEM). Heat sinks are placed at the hotspot area to ensure the reliability of all working chips. Finally, this transceiver system based on silicon photonics is measured, and the eye diagrams of data and clock signals are verified.
Wide-Range Motion Estimation Architecture with Dual Search Windows for High Resolution Video Coding
NASA Astrophysics Data System (ADS)
Dung, Lan-Rong; Lin, Meng-Chun
This paper presents a memory-efficient motion estimation (ME) technique for high-resolution video compression. The main objective is to reduce the external memory access, especially for limited local memory resource. The reduction of memory access can successfully save the notorious power consumption. The key to reduce the memory accesses is based on center-biased algorithm in that the center-biased algorithm performs the motion vector (MV) searching with the minimum search data. While considering the data reusability, the proposed dual-search-windowing (DSW) approaches use the secondary windowing as an option per searching necessity. By doing so, the loading of search windows can be alleviated and hence reduce the required external memory bandwidth. The proposed techniques can save up to 81% of external memory bandwidth and require only 135 MBytes/sec, while the quality degradation is less than 0.2dB for 720p HDTV clips coded at 8Mbits/sec.
On the Floating Point Performance of the i860 Microprocessor
NASA Technical Reports Server (NTRS)
Lee, King; Kutler, Paul (Technical Monitor)
1997-01-01
The i860 microprocessor is a pipelined processor that can deliver two double precision floating point results every clock. It is being used in the Touchstone project to develop a teraflop computer by the year 2000. With such high computational capabilities it was expected that memory bandwidth would limit performance on many kernels. Measured performance of three kernels showed performance is less than what memory bandwidth limitations would predict. This paper develops a model that explains the discrepancy in terms of memory latencies and points to some problems involved in moving data from memory to the arithmetic pipelines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Langer, Steven H.; Karlin, Ian; Marinak, Marty M.
HYDRA is used to simulate a variety of experiments carried out at the National Ignition Facility (NIF) [4] and other high energy density physics facilities. HYDRA has packages to simulate radiation transfer, atomic physics, hydrodynamics, laser propagation, and a number of other physics effects. HYDRA has over one million lines of code and includes both MPI and thread-level (OpenMP and pthreads) parallelism. This paper measures the performance characteristics of HYDRA using hardware counters on an IBM BlueGene/Q system. We report key ratios such as bytes/instruction and memory bandwidth for several different physics packages. The total number of bytes read andmore » written per time step is also reported. We show that none of the packages which use significant time are memory bandwidth limited on a Blue Gene/Q. HYDRA currently issues very few SIMD instructions. The pressure on memory bandwidth will increase if high levels of SIMD instructions can be achieved.« less
Fusion PIC code performance analysis on the Cori KNL system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koskela, Tuomas S.; Deslippe, Jack; Friesen, Brian
We study the attainable performance of Particle-In-Cell codes on the Cori KNL system by analyzing a miniature particle push application based on the fusion PIC code XGC1. We start from the most basic building blocks of a PIC code and build up the complexity to identify the kernels that cost the most in performance and focus optimization efforts there. Particle push kernels operate at high AI and are not likely to be memory bandwidth or even cache bandwidth bound on KNL. Therefore, we see only minor benefits from the high bandwidth memory available on KNL, and achieving good vectorization ismore » shown to be the most beneficial optimization path with theoretical yield of up to 8x speedup on KNL. In practice we are able to obtain up to a 4x gain from vectorization due to limitations set by the data layout and memory latency.« less
NASA Technical Reports Server (NTRS)
Bradley, D. B.; Irwin, J. D.
1974-01-01
A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.
From photons to phonons and back: a THz optical memory in diamond.
England, D G; Bustard, P J; Nunn, J; Lausten, R; Sussman, B J
2013-12-13
Optical quantum memories are vital for the scalability of future quantum technologies, enabling long-distance secure communication and local synchronization of quantum components. We demonstrate a THz-bandwidth memory for light using the optical phonon modes of a room temperature diamond. This large bandwidth makes the memory compatible with down-conversion-type photon sources. We demonstrate that four-wave mixing noise in this system is suppressed by material dispersion. The resulting noise floor is just 7×10(-3) photons per pulse, which establishes that the memory is capable of storing single quanta. We investigate the principle sources of noise in this system and demonstrate that high material dispersion can be used to suppress four-wave mixing noise in Λ-type systems.
Livermore Big Artificial Neural Network Toolkit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Essen, Brian Van; Jacobs, Sam; Kim, Hyojin
2016-07-01
LBANN is a toolkit that is designed to train artificial neural networks efficiently on high performance computing architectures. It is optimized to take advantages of key High Performance Computing features to accelerate neural network training. Specifically it is optimized for low-latency, high bandwidth interconnects, node-local NVRAM, node-local GPU accelerators, and high bandwidth parallel file systems. It is built on top of the open source Elemental distributed-memory dense and spars-direct linear algebra and optimization library that is released under the BSD license. The algorithms contained within LBANN are drawn from the academic literature and implemented to work within a distributed-memory framework.
Methods for compressible fluid simulation on GPUs using high-order finite differences
NASA Astrophysics Data System (ADS)
Pekkilä, Johannes; Väisälä, Miikka S.; Käpylä, Maarit J.; Käpylä, Petri J.; Anjum, Omer
2017-08-01
We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates 343 million grid points per second on a Tesla K40t GPU, achieving a 3 . 6 × speedup over a comparable hydrodynamics solver benchmarked on two Intel Xeon E5-2690v3 processors. Our alternative GPU implementation is latency-bound and achieves the rate of 168 million updates per second.
England, Duncan G; Fisher, Kent A G; MacLean, Jean-Philippe W; Bustard, Philip J; Lausten, Rune; Resch, Kevin J; Sussman, Benjamin J
2015-02-06
We report the storage and retrieval of single photons, via a quantum memory, in the optical phonons of a room-temperature bulk diamond. The THz-bandwidth heralded photons are generated by spontaneous parametric down-conversion and mapped to phonons via a Raman transition, stored for a variable delay, and released on demand. The second-order correlation of the memory output is g((2))(0)=0.65±0.07, demonstrating a preservation of nonclassical photon statistics throughout storage and retrieval. The memory is low noise, high speed and broadly tunable; it therefore promises to be a versatile light-matter interface for local quantum processing applications.
Toshiba TDF-500 High Resolution Viewing And Analysis System
NASA Astrophysics Data System (ADS)
Roberts, Barry; Kakegawa, M.; Nishikawa, M.; Oikawa, D.
1988-06-01
A high resolution, operator interactive, medical viewing and analysis system has been developed by Toshiba and Bio-Imaging Research. This system provides many advanced features including high resolution displays, a very large image memory and advanced image processing capability. In particular, the system provides CRT frame buffers capable of update in one frame period, an array processor capable of image processing at operator interactive speeds, and a memory system capable of updating multiple frame buffers at frame rates whilst supporting multiple array processors. The display system provides 1024 x 1536 display resolution at 40Hz frame and 80Hz field rates. In particular, the ability to provide whole or partial update of the screen at the scanning rate is a key feature. This allows multiple viewports or windows in the display buffer with both fixed and cine capability. To support image processing features such as windowing, pan, zoom, minification, filtering, ROI analysis, multiplanar and 3D reconstruction, a high performance CPU is integrated into the system. This CPU is an array processor capable of up to 400 million instructions per second. To support the multiple viewer and array processors' instantaneous high memory bandwidth requirement, an ultra fast memory system is used. This memory system has a bandwidth capability of 400MB/sec and a total capacity of 256MB. This bandwidth is more than adequate to support several high resolution CRT's and also the fast processing unit. This fully integrated approach allows effective real time image processing. The integrated design of viewing system, memory system and array processor are key to the imaging system. It is the intention to describe the architecture of the image system in this paper.
Hardware architecture design of a fast global motion estimation method
NASA Astrophysics Data System (ADS)
Liang, Chaobing; Sang, Hongshi; Shen, Xubang
2015-12-01
VLSI implementation of gradient-based global motion estimation (GME) faces two main challenges: irregular data access and high off-chip memory bandwidth requirement. We previously proposed a fast GME method that reduces computational complexity by choosing certain number of small patches containing corners and using them in a gradient-based framework. A hardware architecture is designed to implement this method and further reduce off-chip memory bandwidth requirement. On-chip memories are used to store coordinates of the corners and template patches, while the Gaussian pyramids of both the template and reference frame are stored in off-chip SDRAMs. By performing geometric transform only on the coordinates of the center pixel of a 3-by-3 patch in the template image, a 5-by-5 area containing the warped 3-by-3 patch in the reference image is extracted from the SDRAMs by burst read. Patched-based and burst mode data access helps to keep the off-chip memory bandwidth requirement at the minimum. Although patch size varies at different pyramid level, all patches are processed in term of 3x3 patches, so the utilization of the patch-processing circuit reaches 100%. FPGA implementation results show that the design utilizes 24,080 bits on-chip memory and for a sequence with resolution of 352x288 and frequency of 60Hz, the off-chip bandwidth requirement is only 3.96Mbyte/s, compared with 243.84Mbyte/s of the original gradient-based GME method. This design can be used in applications like video codec, video stabilization, and super-resolution, where real-time GME is a necessity and minimum memory bandwidth requirement is appreciated.
High-speed quantum networking by ship
NASA Astrophysics Data System (ADS)
Devitt, Simon J.; Greentree, Andrew D.; Stephens, Ashley M.; van Meter, Rodney
2016-11-01
Networked entanglement is an essential component for a plethora of quantum computation and communication protocols. Direct transmission of quantum signals over long distances is prevented by fibre attenuation and the no-cloning theorem, motivating the development of quantum repeaters, designed to purify entanglement, extending its range. Quantum repeaters have been demonstrated over short distances, but error-corrected, global repeater networks with high bandwidth require new technology. Here we show that error corrected quantum memories installed in cargo containers and carried by ship can provide a exible connection between local networks, enabling low-latency, high-fidelity quantum communication across global distances at higher bandwidths than previously proposed. With demonstrations of technology with sufficient fidelity to enable topological error-correction, implementation of the quantum memories is within reach, and bandwidth increases with improvements in fabrication. Our approach to quantum networking avoids technological restrictions of repeater deployment, providing an alternate path to a worldwide Quantum Internet.
High-speed quantum networking by ship
Devitt, Simon J.; Greentree, Andrew D.; Stephens, Ashley M.; Van Meter, Rodney
2016-01-01
Networked entanglement is an essential component for a plethora of quantum computation and communication protocols. Direct transmission of quantum signals over long distances is prevented by fibre attenuation and the no-cloning theorem, motivating the development of quantum repeaters, designed to purify entanglement, extending its range. Quantum repeaters have been demonstrated over short distances, but error-corrected, global repeater networks with high bandwidth require new technology. Here we show that error corrected quantum memories installed in cargo containers and carried by ship can provide a exible connection between local networks, enabling low-latency, high-fidelity quantum communication across global distances at higher bandwidths than previously proposed. With demonstrations of technology with sufficient fidelity to enable topological error-correction, implementation of the quantum memories is within reach, and bandwidth increases with improvements in fabrication. Our approach to quantum networking avoids technological restrictions of repeater deployment, providing an alternate path to a worldwide Quantum Internet. PMID:27805001
High-speed quantum networking by ship.
Devitt, Simon J; Greentree, Andrew D; Stephens, Ashley M; Van Meter, Rodney
2016-11-02
Networked entanglement is an essential component for a plethora of quantum computation and communication protocols. Direct transmission of quantum signals over long distances is prevented by fibre attenuation and the no-cloning theorem, motivating the development of quantum repeaters, designed to purify entanglement, extending its range. Quantum repeaters have been demonstrated over short distances, but error-corrected, global repeater networks with high bandwidth require new technology. Here we show that error corrected quantum memories installed in cargo containers and carried by ship can provide a exible connection between local networks, enabling low-latency, high-fidelity quantum communication across global distances at higher bandwidths than previously proposed. With demonstrations of technology with sufficient fidelity to enable topological error-correction, implementation of the quantum memories is within reach, and bandwidth increases with improvements in fabrication. Our approach to quantum networking avoids technological restrictions of repeater deployment, providing an alternate path to a worldwide Quantum Internet.
Collective input/output under memory constraints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Yin; Chen, Yong; Zhuang, Yu
2014-12-18
Compared with current high-performance computing (HPC) systems, exascale systems are expected to have much less memory per node, which can significantly reduce necessary collective input/output (I/O) performance. In this study, we introduce a memory-conscious collective I/O strategy that takes into account memory capacity and bandwidth constraints. The new strategy restricts aggregation data traffic within disjointed subgroups, coordinates I/O accesses in intranode and internode layers, and determines I/O aggregators at run time considering memory consumption among processes. We have prototyped the design and evaluated it with commonly used benchmarks to verify its potential. The evaluation results demonstrate that this strategy holdsmore » promise in mitigating the memory pressure, alleviating the contention for memory bandwidth, and improving the I/O performance for projected extreme-scale systems. Given the importance of supporting increasingly data-intensive workloads and projected memory constraints on increasingly larger scale HPC systems, this new memory-conscious collective I/O can have a significant positive impact on scientific discovery productivity.« less
Frequency and bandwidth conversion of single photons in a room-temperature diamond quantum memory
Fisher, Kent A. G.; England, Duncan G.; MacLean, Jean-Philippe W.; Bustard, Philip J.; Resch, Kevin J.; Sussman, Benjamin J.
2016-01-01
The spectral manipulation of photons is essential for linking components in a quantum network. Large frequency shifts are needed for conversion between optical and telecommunication frequencies, while smaller shifts are useful for frequency-multiplexing quantum systems, in the same way that wavelength division multiplexing is used in classical communications. Here we demonstrate frequency and bandwidth conversion of single photons in a room-temperature diamond quantum memory. Heralded 723.5 nm photons, with 4.1 nm bandwidth, are stored as optical phonons in the diamond via a Raman transition. Upon retrieval from the diamond memory, the spectral shape of the photons is determined by a tunable read pulse through the reverse Raman transition. We report central frequency tunability over 4.2 times the input bandwidth, and bandwidth modulation between 0.5 and 1.9 times the input bandwidth. Our results demonstrate the potential for diamond, and Raman memories in general, as an integrated platform for photon storage and spectral conversion. PMID:27045988
Frequency and bandwidth conversion of single photons in a room-temperature diamond quantum memory.
Fisher, Kent A G; England, Duncan G; MacLean, Jean-Philippe W; Bustard, Philip J; Resch, Kevin J; Sussman, Benjamin J
2016-04-05
The spectral manipulation of photons is essential for linking components in a quantum network. Large frequency shifts are needed for conversion between optical and telecommunication frequencies, while smaller shifts are useful for frequency-multiplexing quantum systems, in the same way that wavelength division multiplexing is used in classical communications. Here we demonstrate frequency and bandwidth conversion of single photons in a room-temperature diamond quantum memory. Heralded 723.5 nm photons, with 4.1 nm bandwidth, are stored as optical phonons in the diamond via a Raman transition. Upon retrieval from the diamond memory, the spectral shape of the photons is determined by a tunable read pulse through the reverse Raman transition. We report central frequency tunability over 4.2 times the input bandwidth, and bandwidth modulation between 0.5 and 1.9 times the input bandwidth. Our results demonstrate the potential for diamond, and Raman memories in general, as an integrated platform for photon storage and spectral conversion.
Wang, Kang; Gu, Huaxi; Yang, Yintang; Wang, Kun
2015-08-10
With the number of cores increasing, there is an emerging need for a high-bandwidth low-latency interconnection network, serving core-to-memory communication. In this paper, aiming at the goal of simultaneous access to multi-rank memory, we propose an optical interconnection network for core-to-memory communication. In the proposed network, the wavelength usage is delicately arranged so that cores can communicate with different ranks at the same time and broadcast for flow control can be achieved. A distributed memory controller architecture that works in a pipeline mode is also designed for efficient optical communication and transaction address processes. The scaling method and wavelength assignment for the proposed network are investigated. Compared with traditional electronic bus-based core-to-memory communication, the simulation results based on the PARSEC benchmark show that the bandwidth enhancement and latency reduction are apparent.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio
Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
NASA Astrophysics Data System (ADS)
Yang, Chen; Liu, LeiBo; Yin, ShouYi; Wei, ShaoJun
2014-12-01
The computational capability of a coarse-grained reconfigurable array (CGRA) can be significantly restrained due to data and context memory bandwidth bottlenecks. Traditionally, two methods have been used to resolve this problem. One method loads the context into the CGRA at run time. This method occupies very small on-chip memory but induces very large latency, which leads to low computational efficiency. The other method adopts a multi-context structure. This method loads the context into the on-chip context memory at the boot phase. Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis. The size of the context memory induces a large area overhead in multi-context structures, which results in major restrictions on application complexity. This paper proposes a Predictable Context Cache (PCC) architecture to address the above context issues by buffering the context inside a CGRA. In this architecture, context is dynamically transferred into the CGRA. Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory. Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue. Rather than fundamentally reducing the amount of input data, the transferred data and computations are processed in parallel. However, the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases. This paper also presents a Hierarchical Data Memory (HDM) architecture as a solution to the efficiency problem. In this architecture, high internal bandwidth is provided to buffer both reused input data and intermediate data. The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved. As a result of using PCC and HDM, experiments running mainstream video decoding programs achieved performance improvements of 13.57%-19.48% when there was a reasonable memory size. Therefore, 1080p@35.7fps for H.264 high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency. Further, the size of the on-chip context memory no longer restricted complex applications, which were efficiently executed on the PCC and HDM architecture.
A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
This paper presents an architecture template for next-generation high performance computing systems specifically targeted to irregular applications. We start our work by considering that future generation interconnection and memory bandwidth full-system numbers are expected to grow by a factor of 10. In order to keep up with such a communication capacity, while still resorting to fine-grained multithreading as the main way to tolerate unpredictable memory access latencies of irregular applications, we show how overall performance scaling can benefit from the multi-core paradigm. At the same time, we also show how such an architecture template must be coupled with specific techniquesmore » in order to optimize bandwidth utilization and achieve the maximum scalability. We propose a technique based on memory references aggregation, together with the related hardware implementation, as one of such optimization techniques. We explore the proposed architecture template by focusing on the Cray XMT architecture and, using a dedicated simulation infrastructure, validate the performance of our template with two typical irregular applications. Our experimental results prove the benefits provided by both the multi-core approach and the bandwidth optimization reference aggregation technique.« less
Building a Terabyte Memory Bandwidth Compute Node with Four Consumer Electronics GPUs
NASA Astrophysics Data System (ADS)
Omlin, Samuel; Räss, Ludovic; Podladchikov, Yuri
2014-05-01
GPUs released for consumer electronics are generally built with the same chip architectures as the GPUs released for professional usage. With regards to scientific computing, there are no obvious important differences in functionality or performance between the two types of releases, yet the price can differ up to one order of magnitude. For example, the consumer electronics release of the most recent NVIDIA Kepler architecture (GK110), named GeForce GTX TITAN, performed equally well in conducted memory bandwidth tests as the professional release, named Tesla K20; the consumer electronics release costs about one third of the professional release. We explain how to design and assemble a well adjusted computer with four high-end consumer electronics GPUs (GeForce GTX TITAN) combining more than 1 terabyte/s memory bandwidth. We compare the system's performance and precision with the one of hardware released for professional usage. The system can be used as a powerful workstation for scientific computing or as a compute node in a home-built GPU cluster.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clark, M. A.; Strelchenko, Alexei; Vaquero, Alejandro
Lattice quantum chromodynamics simulations in nuclear physics have benefited from a tremendous number of algorithmic advances such as multigrid and eigenvector deflation. These improve the time to solution but do not alleviate the intrinsic memory-bandwidth constraints of the matrix-vector operation dominating iterative solvers. Batching this operation for multiple vectors and exploiting cache and register blocking can yield a super-linear speed up. Block-Krylov solvers can naturally take advantage of such batched matrix-vector operations, further reducing the iterations to solution by sharing the Krylov space between solves. However, practical implementations typically suffer from the quadratic scaling in the number of vector-vector operations.more » Using the QUDA library, we present an implementation of a block-CG solver on NVIDIA GPUs which reduces the memory-bandwidth complexity of vector-vector operations from quadratic to linear. We present results for the HISQ discretization, showing a 5x speedup compared to highly-optimized independent Krylov solves on NVIDIA's SaturnV cluster.« less
ASIC-based architecture for the real-time computation of 2D convolution with large kernel size
NASA Astrophysics Data System (ADS)
Shao, Rui; Zhong, Sheng; Yan, Luxin
2015-12-01
Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the ASIC-based implementation of 2-D convolution with medium-large kernels. Aiming to improve the efficiency of storage resources on-chip, reducing off-chip bandwidth of these two issues, proposed construction of a data cache reuse. Multi-block SPRAM to cross cached images and the on-chip ping-pong operation takes full advantage of the data convolution calculation reuse, design a new ASIC data scheduling scheme and overall architecture. Experimental results show that the structure can achieve 40× 32 size of template real-time convolution operations, and improve the utilization of on-chip memory bandwidth and on-chip memory resources, the experimental results show that the structure satisfies the conditions to maximize data throughput output , reducing the need for off-chip memory bandwidth.
2004-07-01
steadily for the past fifteen years, while memory latency and bandwidth have improved much more slowly. For example, Intel processor clock rates38 have... processor and memory performance) all greatly restrict the ability to achieve high levels of performance for science, engineering, and national...sub-nuclear distances. Guide experiments to identify transition from quantum chromodynamics to quark -gluon plasma. Accelerator Physics Accurate
A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vineyard, Craig Michael; Verzi, Stephen Joseph
As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilizemore » memory.« less
Highly Efficient Coherent Optical Memory Based on Electromagnetically Induced Transparency
NASA Astrophysics Data System (ADS)
Hsiao, Ya-Fen; Tsai, Pin-Ju; Chen, Hung-Shiue; Lin, Sheng-Xiang; Hung, Chih-Chiao; Lee, Chih-Hsi; Chen, Yi-Hsin; Chen, Yong-Fan; Yu, Ite A.; Chen, Ying-Cheng
2018-05-01
Quantum memory is an important component in the long-distance quantum communication based on the quantum repeater protocol. To outperform the direct transmission of photons with quantum repeaters, it is crucial to develop quantum memories with high fidelity, high efficiency and a long storage time. Here, we achieve a storage efficiency of 92.0 (1.5)% for a coherent optical memory based on the electromagnetically induced transparency scheme in optically dense cold atomic media. We also obtain a useful time-bandwidth product of 1200, considering only storage where the retrieval efficiency remains above 50%. Both are the best record to date in all kinds of schemes for the realization of optical memory. Our work significantly advances the pursuit of a high-performance optical memory and should have important applications in quantum information science.
Highly Efficient Coherent Optical Memory Based on Electromagnetically Induced Transparency.
Hsiao, Ya-Fen; Tsai, Pin-Ju; Chen, Hung-Shiue; Lin, Sheng-Xiang; Hung, Chih-Chiao; Lee, Chih-Hsi; Chen, Yi-Hsin; Chen, Yong-Fan; Yu, Ite A; Chen, Ying-Cheng
2018-05-04
Quantum memory is an important component in the long-distance quantum communication based on the quantum repeater protocol. To outperform the direct transmission of photons with quantum repeaters, it is crucial to develop quantum memories with high fidelity, high efficiency and a long storage time. Here, we achieve a storage efficiency of 92.0 (1.5)% for a coherent optical memory based on the electromagnetically induced transparency scheme in optically dense cold atomic media. We also obtain a useful time-bandwidth product of 1200, considering only storage where the retrieval efficiency remains above 50%. Both are the best record to date in all kinds of schemes for the realization of optical memory. Our work significantly advances the pursuit of a high-performance optical memory and should have important applications in quantum information science.
Extending the BEAGLE library to a multi-FPGA platform.
Jin, Zheming; Bakos, Jason D
2013-01-19
Maximum Likelihood (ML)-based phylogenetic inference using Felsenstein's pruning algorithm is a standard method for estimating the evolutionary relationships amongst a set of species based on DNA sequence data, and is used in popular applications such as RAxML, PHYLIP, GARLI, BEAST, and MrBayes. The Phylogenetic Likelihood Function (PLF) and its associated scaling and normalization steps comprise the computational kernel for these tools. These computations are data intensive but contain fine grain parallelism that can be exploited by coprocessor architectures such as FPGAs and GPUs. A general purpose API called BEAGLE has recently been developed that includes optimized implementations of Felsenstein's pruning algorithm for various data parallel architectures. In this paper, we extend the BEAGLE API to a multiple Field Programmable Gate Array (FPGA)-based platform called the Convey HC-1. The core calculation of our implementation, which includes both the phylogenetic likelihood function (PLF) and the tree likelihood calculation, has an arithmetic intensity of 130 floating-point operations per 64 bytes of I/O, or 2.03 ops/byte. Its performance can thus be calculated as a function of the host platform's peak memory bandwidth and the implementation's memory efficiency, as 2.03 × peak bandwidth × memory efficiency. Our FPGA-based platform has a peak bandwidth of 76.8 GB/s and our implementation achieves a memory efficiency of approximately 50%, which gives an average throughput of 78 Gflops. This represents a ~40X speedup when compared with BEAGLE's CPU implementation on a dual Xeon 5520 and 3X speedup versus BEAGLE's GPU implementation on a Tesla T10 GPU for very large data sizes. The power consumption is 92 W, yielding a power efficiency of 1.7 Gflops per Watt. The use of data parallel architectures to achieve high performance for likelihood-based phylogenetic inference requires high memory bandwidth and a design methodology that emphasizes high memory efficiency. To achieve this objective, we integrated 32 pipelined processing elements (PEs) across four FPGAs. For the design of each PE, we developed a specialized synthesis tool to generate a floating-point pipeline with resource and throughput constraints to match the target platform. We have found that using low-latency floating-point operators can significantly reduce FPGA area and still meet timing requirement on the target platform. We found that this design methodology can achieve performance that exceeds that of a GPU-based coprocessor.
High-performance Raman memory with spatio-temporal reversal
NASA Astrophysics Data System (ADS)
Vernaz-Gris, Pierre; Tranter, Aaron D.; Everett, Jesse L.; Leung, Anthony C.; Paul, Karun V.; Campbell, Geoff T.; Lam, Ping Koy; Buchler, Ben C.
2018-05-01
A number of techniques exist to use an ensemble of atoms as a quantum memory for light. Many of these propose to use backward retrieval as a way to improve the storage and recall efficiency. We report on a demonstration of an off-resonant Raman memory that uses backward retrieval to achieve an efficiency of $65\\pm6\\%$ at a storage time of one pulse duration. The memory has a characteristic decay time of 60 $\\mu$s, corresponding to a delay-bandwidth product of $160$.
Enabling Secure High-Performance Wireless Ad Hoc Networking
2003-05-29
destinations, consuming energy and available bandwidth. An attacker may similarly create a routing black hole, in which all packets are dropped: by sending...of the vertex cut, for example by forwarding only routing packets and not data packets, such that the nodes waste energy forwarding packets to the...with limited resources, including network bandwidth and the CPU processing capacity, memory, and battery power ( energy ) of each individual node in the
Detecting Gravitational Wave Memory without Parent Signals
NASA Astrophysics Data System (ADS)
McNeill, Lucy O.; Thrane, Eric; Lasky, Paul D.
2017-05-01
Gravitational-wave memory manifests as a permanent distortion of an idealized gravitational-wave detector and arises generically from energetic astrophysical events. For example, binary black hole mergers are expected to emit memory bursts a little more than an order of magnitude smaller in strain than the oscillatory parent waves. We introduce the concept of "orphan memory": gravitational-wave memory for which there is no detectable parent signal. In particular, high-frequency gravitational-wave bursts (≳kHz ) produce orphan memory in the LIGO/Virgo band. We show that Advanced LIGO measurements can place stringent limits on the existence of high-frequency gravitational waves, effectively increasing the LIGO bandwidth by orders of magnitude. We investigate the prospects for and implications of future searches for orphan memory.
NASA's 3D Flight Computer for Space Applications
NASA Technical Reports Server (NTRS)
Alkalai, Leon
2000-01-01
The New Millennium Program (NMP) Integrated Product Development Team (IPDT) for Microelectronics Systems was planning to validate a newly developed 3D Flight Computer system on its first deep-space flight, DS1, launched in October 1998. This computer, developed in the 1995-97 time frame, contains many new computer technologies previously never used in deep-space systems. They include: advanced 3D packaging architecture for future low-mass and low-volume avionics systems; high-density 3D packaged chip-stacks for both volatile and non-volatile mass memory: 400 Mbytes of local DRAM memory, and 128 Mbytes of Flash memory; high-bandwidth Peripheral Component Interface (Per) local-bus with a bridge to VME; high-bandwidth (20 Mbps) fiber-optic serial bus; and other attributes, such as standard support for Design for Testability (DFT). Even though this computer system did not complete on time for delivery to the DS1 project, it was an important development along a technology roadmap towards highly integrated and highly miniaturized avionics systems for deep-space applications. This continued technology development is now being performed by NASA's Deep Space System Development Program (also known as X2000) and within JPL's Center for Integrated Space Microsystems (CISM).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Suzuki, Kazumasa; Ishi-Hayase, Junko; Akahane, Kouichi
2013-12-04
We performed the proof-of-principle demonstration of photon-echo quantum memory using strain-compensated InAs quantum dot ensemble in the telecommunication wavelength range. We succeeded in transfer and retrieval of relative phase of a time-bin pulse with a high fidelity. Our demonstration suggests the possibility of realizing ultrabroadband, high time-bandwidth products, multi-mode quantum memory which is operable at telecommunication wavelength.
Extending the BEAGLE library to a multi-FPGA platform
2013-01-01
Background Maximum Likelihood (ML)-based phylogenetic inference using Felsenstein’s pruning algorithm is a standard method for estimating the evolutionary relationships amongst a set of species based on DNA sequence data, and is used in popular applications such as RAxML, PHYLIP, GARLI, BEAST, and MrBayes. The Phylogenetic Likelihood Function (PLF) and its associated scaling and normalization steps comprise the computational kernel for these tools. These computations are data intensive but contain fine grain parallelism that can be exploited by coprocessor architectures such as FPGAs and GPUs. A general purpose API called BEAGLE has recently been developed that includes optimized implementations of Felsenstein’s pruning algorithm for various data parallel architectures. In this paper, we extend the BEAGLE API to a multiple Field Programmable Gate Array (FPGA)-based platform called the Convey HC-1. Results The core calculation of our implementation, which includes both the phylogenetic likelihood function (PLF) and the tree likelihood calculation, has an arithmetic intensity of 130 floating-point operations per 64 bytes of I/O, or 2.03 ops/byte. Its performance can thus be calculated as a function of the host platform’s peak memory bandwidth and the implementation’s memory efficiency, as 2.03 × peak bandwidth × memory efficiency. Our FPGA-based platform has a peak bandwidth of 76.8 GB/s and our implementation achieves a memory efficiency of approximately 50%, which gives an average throughput of 78 Gflops. This represents a ~40X speedup when compared with BEAGLE’s CPU implementation on a dual Xeon 5520 and 3X speedup versus BEAGLE’s GPU implementation on a Tesla T10 GPU for very large data sizes. The power consumption is 92 W, yielding a power efficiency of 1.7 Gflops per Watt. Conclusions The use of data parallel architectures to achieve high performance for likelihood-based phylogenetic inference requires high memory bandwidth and a design methodology that emphasizes high memory efficiency. To achieve this objective, we integrated 32 pipelined processing elements (PEs) across four FPGAs. For the design of each PE, we developed a specialized synthesis tool to generate a floating-point pipeline with resource and throughput constraints to match the target platform. We have found that using low-latency floating-point operators can significantly reduce FPGA area and still meet timing requirement on the target platform. We found that this design methodology can achieve performance that exceeds that of a GPU-based coprocessor. PMID:23331707
NASA Technical Reports Server (NTRS)
Schwab, Andrew J. (Inventor); Aylor, James (Inventor); Hitchcock, Charles Young (Inventor); Wulf, William A. (Inventor); McKee, Sally A. (Inventor); Moyer, Stephen A. (Inventor); Klenke, Robert (Inventor)
2000-01-01
A data processing system is disclosed which comprises a data processor and memory control device for controlling the access of information from the memory. The memory control device includes temporary storage and decision ability for determining what order to execute the memory accesses. The compiler detects the requirements of the data processor and selects the data to stream to the memory control device which determines a memory access order. The order in which to access said information is selected based on the location of information stored in the memory. The information is repeatedly accessed from memory and stored in the temporary storage until all streamed information is accessed. The information is stored until required by the data processor. The selection of the order in which to access information maximizes bandwidth and decreases the retrieval time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischler, M.
1992-04-01
The issues to be addressed here are those of balance'' in machine architecture. By this, we mean how much emphasis must be placed on various aspects of the system to maximize its usefulness for physics. There are three components that contribute to the utility of a system: How the machine can be used, how big a problem can be attacked, and what the effective capabilities (power) of the hardware are like. The effective power issue is a matter of evaluating the impact of design decisions trading off architectural features such as memory bandwidth and interprocessor communication capabilities. What is studiedmore » is the effect these machine parameters have on how quickly the system can solve desired problems. There is a reasonable method for studying this: One selects a few representative algorithms and computes the impact of changing memory bandwidths, and so forth. The only room for controversy here is in the selection of representative problems. The issue of how big a problem can be attacked boils down to a balance of memory size versus power. Although this is a balance issue it is very different than the effective power situation, because no firm answer can be given at this time. The power to memory ratio is highly problem dependent, and optimizing it requires several pieces of physics input, including: how big a lattice is needed for interesting results; what sort of algorithms are best to use; and how many sweeps are needed to get valid results. We seem to be at the threshold of learning things about these issues, but for now, the memory size issue will necessarily be addressed in terms of best guesses, rules of thumb, and researchers' opinions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fischler, M.
1992-04-01
The issues to be addressed here are those of ``balance`` in machine architecture. By this, we mean how much emphasis must be placed on various aspects of the system to maximize its usefulness for physics. There are three components that contribute to the utility of a system: How the machine can be used, how big a problem can be attacked, and what the effective capabilities (power) of the hardware are like. The effective power issue is a matter of evaluating the impact of design decisions trading off architectural features such as memory bandwidth and interprocessor communication capabilities. What is studiedmore » is the effect these machine parameters have on how quickly the system can solve desired problems. There is a reasonable method for studying this: One selects a few representative algorithms and computes the impact of changing memory bandwidths, and so forth. The only room for controversy here is in the selection of representative problems. The issue of how big a problem can be attacked boils down to a balance of memory size versus power. Although this is a balance issue it is very different than the effective power situation, because no firm answer can be given at this time. The power to memory ratio is highly problem dependent, and optimizing it requires several pieces of physics input, including: how big a lattice is needed for interesting results; what sort of algorithms are best to use; and how many sweeps are needed to get valid results. We seem to be at the threshold of learning things about these issues, but for now, the memory size issue will necessarily be addressed in terms of best guesses, rules of thumb, and researchers` opinions.« less
Si-based optical I/O for optical memory interface
NASA Astrophysics Data System (ADS)
Ha, Kyoungho; Shin, Dongjae; Byun, Hyunil; Cho, Kwansik; Na, Kyoungwon; Ji, Hochul; Pyo, Junghyung; Hong, Seokyong; Lee, Kwanghyun; Lee, Beomseok; Shin, Yong-hwack; Kim, Junghye; Kim, Seong-gu; Joe, Insung; Suh, Sungdong; Choi, Sanghoon; Han, Sangdeok; Park, Yoondong; Choi, Hanmei; Kuh, Bongjin; Kim, Kichul; Choi, Jinwoo; Park, Sujin; Kim, Hyeunsu; Kim, Kiho; Choi, Jinyong; Lee, Hyunjoo; Yang, Sujin; Park, Sungho; Lee, Minwoo; Cho, Minchang; Kim, Saebyeol; Jeong, Taejin; Hyun, Seokhun; Cho, Cheongryong; Kim, Jeong-kyoum; Yoon, Hong-gu; Nam, Jeongsik; Kwon, Hyukjoon; Lee, Hocheol; Choi, Junghwan; Jang, Sungjin; Choi, Joosun; Chung, Chilhee
2012-01-01
Optical interconnects may provide solutions to the capacity-bandwidth trade-off of recent memory interface systems. For cost-effective optical memory interfaces, Samsung Electronics has been developing silicon photonics platforms on memory-compatible bulk-Si 300-mm wafers. The waveguide of 0.6 dB/mm propagation loss, vertical grating coupler of 2.7 dB coupling loss, modulator of 10 Gbps speed, and Ge/Si photodiode of 12.5 Gbps bandwidth have been achieved on the bulk-Si platform. 2x6.4 Gbps electrical driver circuits have been also fabricated using a CMOS process.
High efficiency Raman memory by suppressing radiation trapping
NASA Astrophysics Data System (ADS)
Thomas, S. E.; Munns, J. H. D.; Kaczmarek, K. T.; Qiu, C.; Brecht, B.; Feizpour, A.; Ledingham, P. M.; Walmsley, I. A.; Nunn, J.; Saunders, D. J.
2017-06-01
Raman interactions in alkali vapours are used in applications such as atomic clocks, optical signal processing, generation of squeezed light and Raman quantum memories for temporal multiplexing. To achieve a strong interaction the alkali ensemble needs both a large optical depth and a high level of spin-polarisation. We implement a technique known as quenching using a molecular buffer gas which allows near-perfect spin-polarisation of over 99.5 % in caesium vapour at high optical depths of up to ˜ 2× {10}5; a factor of 4 higher than can be achieved without quenching. We use this system to explore efficient light storage with high gain in a GHz bandwidth Raman memory.
A versatile design for resonant guided-wave parametric down-conversion sources for quantum repeaters
NASA Astrophysics Data System (ADS)
Brecht, Benjamin; Luo, Kai-Hong; Herrmann, Harald; Silberhorn, Christine
2016-05-01
Quantum repeaters—fundamental building blocks for long-distance quantum communication—are based on the interaction between photons and quantum memories. The photons must fulfil stringent requirements on central frequency, spectral bandwidth and purity in order for this interaction to be efficient. We present a design scheme for monolithically integrated resonant photon-pair sources based on parametric down-conversion in nonlinear waveguides, which facilitate the generation of such photons. We investigate the impact of different design parameters on the performance of our source. The generated photon spectral bandwidths can be varied between several tens of MHz up to around 1 GHz, facilitating an efficient coupling to different memories. The central frequency of the generated photons can be coarsely tuned by adjusting the pump frequency, poling period and sample temperature, and we identify stability requirements on the pump laser and sample temperature that can be readily fulfilled with off-the-shelf components. We find that our source is capable of generating high-purity photons over a wide range of photon bandwidths. Finally, the PDC emission can be frequency fine-tuned over several GHz by simultaneously adjusting the sample temperature and pump frequency. We conclude our study with demonstrating the adaptability of our source to different quantum memories.
Multimodal properties and dynamics of gradient echo quantum memory.
Hétet, G; Longdell, J J; Sellars, M J; Lam, P K; Buchler, B C
2008-11-14
We investigate the properties of a recently proposed gradient echo memory (GEM) scheme for information mapping between optical and atomic systems. We show that GEM can be described by the dynamic formation of polaritons in k space. This picture highlights the flexibility and robustness with regards to the external control of the storage process. Our results also show that, as GEM is a frequency-encoding memory, it can accurately preserve the shape of signals that have large time-bandwidth products, even at moderate optical depths. At higher optical depths, we show that GEM is a high fidelity multimode quantum memory.
Processing-in-Memory Enabled Graphics Processors for 3D Rendering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Chenhao; Song, Shuaiwen; Wang, Jing
2017-02-06
The performance of 3D rendering of Graphics Processing Unit that convents 3D vector stream into 2D frame with 3D image effects significantly impact users’ gaming experience on modern computer systems. Due to the high texture throughput in 3D rendering, main memory bandwidth becomes a critical obstacle for improving the overall rendering performance. 3D stacked memory systems such as Hybrid Memory Cube (HMC) provide opportunities to significantly overcome the memory wall by directly connecting logic controllers to DRAM dies. Based on the observation that texel fetches significantly impact off-chip memory traffic, we propose two architectural designs to enable Processing-In-Memory based GPUmore » for efficient 3D rendering.« less
Reducing noise in a Raman quantum memory.
Bustard, Philip J; England, Duncan G; Heshami, Khabat; Kupchak, Connor; Sussman, Benjamin J
2016-11-01
Optical quantum memories are an important component of future optical and hybrid quantum technologies. Raman schemes are strong candidates for use with ultrashort optical pulses due to their broad bandwidth; however, the elimination of deleterious four-wave mixing noise from Raman memories is critical for practical applications. Here, we demonstrate a quantum memory using the rotational states of hydrogen molecules at room temperature. Polarization selection rules prohibit four-wave mixing, allowing the storage and retrieval of attenuated coherent states with a mean photon number 0.9 and a pulse duration 175 fs. The 1/e memory lifetime is 85.5 ps, demonstrating a time-bandwidth product of ≈480 in a memory that is well suited for use with broadband heralded down-conversion and fiber-based photon sources.
Precision spectral manipulation of optical pulses using a coherent photon echo memory.
Buchler, B C; Hosseini, M; Hétet, G; Sparkes, B M; Lam, P K
2010-04-01
Photon echo schemes are excellent candidates for high efficiency coherent optical memory. They are capable of high-bandwidth multipulse storage, pulse resequencing and have been shown theoretically to be compatible with quantum information applications. One particular photon echo scheme is the gradient echo memory (GEM). In this system, an atomic frequency gradient is induced in the direction of light propagation leading to a Fourier decomposition of the optical spectrum along the length of the storage medium. This Fourier encoding allows precision spectral manipulation of the stored light. In this Letter, we show frequency shifting, spectral compression, spectral splitting, and fine dispersion control of optical pulses using GEM.
Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.
2002-01-01
As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications
NASA Astrophysics Data System (ADS)
Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei
2007-04-01
In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
Sensor Agent Processing Software (SAPS)
2004-05-01
buildings, sewers, and tunnels. The time scale governs many aspects of tactical sensing. In high intensity combat situations forces move within...21 Figure 9-2 BAE Systems Sitex00 High Bandwidth...float) Subscribers Subscribers Preprocessor Channel 1 xout[256] Data File in Memory xout[256] S w i t c h High Pass Filter (IIR) xin[256] xout[256
Opportunities for nonvolatile memory systems in extreme-scale high-performance computing
Vetter, Jeffrey S.; Mittal, Sparsh
2015-01-12
For extreme-scale high-performance computing systems, system-wide power consumption has been identified as one of the key constraints moving forward, where DRAM main memory systems account for about 30 to 50 percent of a node's overall power consumption. As the benefits of device scaling for DRAM memory slow, it will become increasingly difficult to keep memory capacities balanced with increasing computational rates offered by next-generation processors. However, several emerging memory technologies related to nonvolatile memory (NVM) devices are being investigated as an alternative for DRAM. Moving forward, NVM devices could offer solutions for HPC architectures. Researchers are investigating how to integratemore » these emerging technologies into future extreme-scale HPC systems and how to expose these capabilities in the software stack and applications. In addition, current results show several of these strategies could offer high-bandwidth I/O, larger main memory capacities, persistent data structures, and new approaches for application resilience and output postprocessing, such as transaction-based incremental checkpointing and in situ visualization, respectively.« less
Designing a VMEbus FDDI adapter card
NASA Astrophysics Data System (ADS)
Venkataraman, Raman
1992-03-01
This paper presents a system architecture for a VMEbus FDDI adapter card containing a node core, FDDI block, frame buffer memory and system interface unit. Most of the functions of the PHY and MAC layers of FDDI are implemented with National's FDDI chip set and the SMT implementation is simplified with a low cost microcontroller. The factors that influence the system bus bandwidth utilization and FDDI bandwidth utilization are the data path and frame buffer memory architecture. The VRAM based frame buffer memory has two sections - - LLC frame memory and SMT frame memory. Each section with an independent serial access memory (SAM) port provides an independent access after the initial data transfer cycle on the main port and hence, the throughput is maximized on each port of the memory. The SAM port simplifies the system bus master DMA design and the VMEbus interface can be designed with low-cost off-the-shelf interface chips.
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2010-11-02
Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (`DMA`) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (`RTS`) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.
High-speed noise-free optical quantum memory
NASA Astrophysics Data System (ADS)
Kaczmarek, K. T.; Ledingham, P. M.; Brecht, B.; Thomas, S. E.; Thekkadath, G. S.; Lazo-Arjona, O.; Munns, J. H. D.; Poem, E.; Feizpour, A.; Saunders, D. J.; Nunn, J.; Walmsley, I. A.
2018-04-01
Optical quantum memories are devices that store and recall quantum light and are vital to the realization of future photonic quantum networks. To date, much effort has been put into improving storage times and efficiencies of such devices to enable long-distance communications. However, less attention has been devoted to building quantum memories which add zero noise to the output. Even small additional noise can render the memory classical by destroying the fragile quantum signatures of the stored light. Therefore, noise performance is a critical parameter for all quantum memories. Here we introduce an intrinsically noise-free quantum memory protocol based on two-photon off-resonant cascaded absorption (ORCA). We demonstrate successful storage of GHz-bandwidth heralded single photons in a warm atomic vapor with no added noise, confirmed by the unaltered photon-number statistics upon recall. Our ORCA memory meets the stringent noise requirements for quantum memories while combining high-speed and room-temperature operation with technical simplicity, and therefore is immediately applicable to low-latency quantum networks.
Using VirtualGL/TurboVNC Software on the Peregrine System |
High-Performance Computing | NREL VirtualGL/TurboVNC Software on the Peregrine System Using , allowing users to access and share large-memory visualization nodes with high-end graphics processing units may be better than just using X11 forwarding when connecting from a remote site with low bandwidth and
JPEG XS-based frame buffer compression inside HEVC for power-aware video compression
NASA Astrophysics Data System (ADS)
Willème, Alexandre; Descampe, Antonin; Rouvroy, Gaël.; Pellegrin, Pascal; Macq, Benoit
2017-09-01
With the emergence of Ultra-High Definition video, reference frame buffers (FBs) inside HEVC-like encoders and decoders have to sustain huge bandwidth. The power consumed by these external memory accesses accounts for a significant share of the codec's total consumption. This paper describes a solution to significantly decrease the FB's bandwidth, making HEVC encoder more suitable for use in power-aware applications. The proposed prototype consists in integrating an embedded lightweight, low-latency and visually lossless codec at the FB interface inside HEVC in order to store each reference frame as several compressed bitstreams. As opposed to previous works, our solution compresses large picture areas (ranging from a CTU to a frame stripe) independently in order to better exploit the spatial redundancy found in the reference frame. This work investigates two data reuse schemes namely Level-C and Level-D. Our approach is made possible thanks to simplified motion estimation mechanisms further reducing the FB's bandwidth and inducing very low quality degradation. In this work, we integrated JPEG XS, the upcoming standard for lightweight low-latency video compression, inside HEVC. In practice, the proposed implementation is based on HM 16.8 and on XSM 1.1.2 (JPEG XS Test Model). Through this paper, the architecture of our HEVC with JPEG XS-based frame buffer compression is described. Then its performance is compared to HM encoder. Compared to previous works, our prototype provides significant external memory bandwidth reduction. Depending on the reuse scheme, one can expect bandwidth and FB size reduction ranging from 50% to 83.3% without significant quality degradation.
Optoelectronic-cache memory system architecture.
Chiarulli, D M; Levitan, S P
1996-05-10
We present an investigation of the architecture of an optoelectronic cache that can integrate terabit optical memories with the electronic caches associated with high-performance uniprocessors and multiprocessors. The use of optoelectronic-cache memories enables these terabit technologies to provide transparently low-latency secondary memory with frame sizes comparable with disk pages but with latencies that approach those of electronic secondary-cache memories. This enables the implementation of terabit memories with effective access times comparable with the cycle times of current microprocessors. The cache design is based on the use of a smart-pixel array and combines parallel free-space optical input-output to-and-from optical memory with conventional electronic communication to the processor caches. This cache and the optical memory system to which it will interface provide a large random-access memory space that has a lower overall latency than that of magnetic disks and disk arrays. In addition, as a consequence of the high-bandwidth parallel input-output capabilities of optical memories, fault service times for the optoelectronic cache are substantially less than those currently achievable with any rotational media.
NASA Technical Reports Server (NTRS)
Fatoohi, Rod; Saini, Subbash; Ciotti, Robert
2006-01-01
We study the performance of inter-process communication on four high-speed multiprocessor systems using a set of communication benchmarks. The goal is to identify certain limiting factors and bottlenecks with the interconnect of these systems as well as to compare these interconnects. We measured network bandwidth using different number of communicating processors and communication patterns, such as point-to-point communication, collective communication, and dense communication patterns. The four platforms are: a 512-processor SGI Altix 3700 BX2 shared-memory machine with 3.2 GB/s links; a 64-processor (single-streaming) Cray XI shared-memory machine with 32 1.6 GB/s links; a 128-processor Cray Opteron cluster using a Myrinet network; and a 1280-node Dell PowerEdge cluster with an InfiniBand network. Our, results show the impact of the network bandwidth and topology on the overall performance of each interconnect.
Initial Performance Results on IBM POWER6
NASA Technical Reports Server (NTRS)
Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh
2008-01-01
The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Fast, noise-free memory for photon synchronization at room temperature.
Finkelstein, Ran; Poem, Eilon; Michel, Ohad; Lahad, Ohr; Firstenberg, Ofer
2018-01-01
Future quantum photonic networks require coherent optical memories for synchronizing quantum sources and gates of probabilistic nature. We demonstrate a fast ladder memory (FLAME) mapping the optical field onto the superposition between electronic orbitals of rubidium vapor. Using a ladder-level system of orbital transitions with nearly degenerate frequencies simultaneously enables high bandwidth, low noise, and long memory lifetime. We store and retrieve 1.7-ns-long pulses, containing 0.5 photons on average, and observe short-time external efficiency of 25%, memory lifetime (1/ e ) of 86 ns, and below 10 -4 added noise photons. Consequently, coupling this memory to a probabilistic source would enhance the on-demand photon generation probability by a factor of 12, the highest number yet reported for a noise-free, room temperature memory. This paves the way toward the controlled production of large quantum states of light from probabilistic photon sources.
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation
NASA Technical Reports Server (NTRS)
Sterling, Thomas; Bergman, Larry
2000-01-01
Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention semiconductor logic. Wave Division Multiplexing optical communications can approach a peak per fiber bandwidth of 1 Tbps and the new Data Vortex network topology employing this technology can connect tens of thousands of ports providing a bi-section bandwidth on the order of a Petabyte per second with latencies well below 100 nanoseconds, even under heavy loads. Processor-in-Memory (PIM) technology combines logic and memory on the same chip exposing the internal bandwidth of the memory row buffers at low latency. And holographic storage photorefractive storage technologies provide high-density memory with access a thousand times faster than conventional disk technologies. Together these technologies enable a new class of shared memory system architecture with a peak performance in the range of a Petaflops but size and power requirements comparable to today's largest Teraflops scale systems. To achieve high-sustained performance, HTMT combines an advanced multithreading processor architecture with a memory-driven coarse-grained latency management strategy called "percolation", yielding high efficiency while reducing the much of the parallel programming burden. This paper will present the basic system architecture characteristics made possible through this series of advanced technologies and then give a detailed description of the new percolation approach to runtime latency management.
Optical actuators for fly-by-light applications
NASA Astrophysics Data System (ADS)
Chee, Sonny H. S.; Liu, Kexing; Measures, Raymond M.
1993-04-01
A review of optomechanical interfaces is presented. A detailed quantitative and qualitative analysis of the University of Toronto Institute for Aerospace Studies (UTIAS) box, optopneumatics, optical activation of a bimetal, optical activation of the shape memory effect, and optical activation of the pyroelectric effects is given. The UTIAS box is found to display a good conversion efficiency and a high bandwidth. A preliminary UTIAS box design has achieved a conversion efficiency of about 1/6 of the theoretical limit and a bandwidth of 2 Hz. In comparison to previous optomechanical interfaces, the UTIAS box has the highest pressure development to optical power ratio (at least an order of magnitude greater).
Data Movement Dominates: Advanced Memory Technology to Address the Real Exascale Power Problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bergman, Keren
Energy is the fundamental barrier to Exascale supercomputing and is dominated by the cost of moving data from one point to another, not computation. Similarly, performance is dominated by data movement, not computation. The solution to this problem requires three critical technologies: 3D integration, optical chip-to-chip communication, and a new communication model. The central goal of the Sandia led "Data Movement Dominates" project aimed to develop memory systems and new architectures based on these technologies that have the potential to lower the cost of local memory accesses by orders of magnitude and provide substantially more bandwidth. Only through these transformationalmore » advances can future systems reach the goals of Exascale computing with a manageable power budgets. The Sandia led team included co-PIs from Columbia University, Lawrence Berkeley Lab, and the University of Maryland. The Columbia effort of Data Movement Dominates focused on developing a physically accurate simulation environment and experimental verification for optically-connected memory (OCM) systems that can enable continued performance scaling through high-bandwidth capacity, energy-efficient bit-rate transparency, and time-of-flight latency. With OCM, memory device parallelism and total capacity can scale to match future high-performance computing requirements without sacrificing data-movement efficiency. When we consider systems with integrated photonics, links to memory can be seamlessly integrated with the interconnection network-in a sense, memory becomes a primary aspect of the interconnection network. At the core of the Columbia effort, toward expanding our understanding of OCM enabled computing we have created an integrated modeling and simulation environment that uniquely integrates the physical behavior of the optical layer. The PhoenxSim suite of design and software tools developed under this effort has enabled the co-design of and performance evaluation photonics-enabled OCM architectures on Exascale computing systems.« less
A Reconfigurable Real-Time Compressive-Sampling Camera for Biological Applications
Fu, Bo; Pitter, Mark C.; Russell, Noah A.
2011-01-01
Many applications in biology, such as long-term functional imaging of neural and cardiac systems, require continuous high-speed imaging. This is typically not possible, however, using commercially available systems. The frame rate and the recording time of high-speed cameras are limited by the digitization rate and the capacity of on-camera memory. Further restrictions are often imposed by the limited bandwidth of the data link to the host computer. Even if the system bandwidth is not a limiting factor, continuous high-speed acquisition results in very large volumes of data that are difficult to handle, particularly when real-time analysis is required. In response to this issue many cameras allow a predetermined, rectangular region of interest (ROI) to be sampled, however this approach lacks flexibility and is blind to the image region outside of the ROI. We have addressed this problem by building a camera system using a randomly-addressable CMOS sensor. The camera has a low bandwidth, but is able to capture continuous high-speed images of an arbitrarily defined ROI, using most of the available bandwidth, while simultaneously acquiring low-speed, full frame images using the remaining bandwidth. In addition, the camera is able to use the full-frame information to recalculate the positions of targets and update the high-speed ROIs without interrupting acquisition. In this way the camera is capable of imaging moving targets at high-speed while simultaneously imaging the whole frame at a lower speed. We have used this camera system to monitor the heartbeat and blood cell flow of a water flea (Daphnia) at frame rates in excess of 1500 fps. PMID:22028852
Exploring the use of I/O nodes for computation in a MIMD multiprocessor
NASA Technical Reports Server (NTRS)
Kotz, David; Cai, Ting
1995-01-01
As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Expanded interleaved solid-state memory for a wide bandwidth transient waveform recorder
NASA Technical Reports Server (NTRS)
Thomas, R. M., Jr.
1980-01-01
An interleaved, solid state expanded memory for a 100 MHz bandwidth waveform recorder is described. The memory development resulted in a significant increase in the storage capacity of a commercially available recorder. The motivation for the memory expansion of the waveform recorder, which is used to support in-flight measurement of the electromagnetic characteristics of lightning discharges, was the need for a significantly longer data window than that provided by the commercially available unit. The expanded recorder provides a data window that is 128 times longer than the commercial unit, while maintaining the same time resolution, by increasing the storage capacity from 1024 to 131 072 data samples. The expanded unit operates at sample periods as small as 10 ns. Sampling once every 10 ns, the commercial unit records for about 10 microseconds before the memory is filled, whereas, the expanded unit records for about 1300 microseconds. A photo of the expanded waveform recorder is shown.
Cache write generate for parallel image processing on shared memory architectures.
Wittenbrink, C M; Somani, A K; Chen, C H
1996-01-01
We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
A 64Cycles/MB, Luma-Chroma Parallelized H.264/AVC Deblocking Filter for 4K × 2K Applications
NASA Astrophysics Data System (ADS)
Shen, Weiwei; Fan, Yibo; Zeng, Xiaoyang
In this paper, a high-throughput debloking filter is presented for H.264/AVC standard, catering video applications with 4K × 2K (4096 × 2304) ultra-definition resolution. In order to strengthen the parallelism without simply increasing the area, we propose a luma-chroma parallel method. Meanwhile, this work reduces the number of processing cycles, the amount of external memory traffic and the working frequency, by using triple four-stage pipeline filters and a luma-chroma interlaced sequence. Furthermore, it eliminates most unnecessary off-chip memory bandwidth with a highly reusable memory scheme, and adopts a “slide window” buffer scheme. As a result, our design can support 4K × 2K at 30fps applications at the working frequency of only 70.8MHz.
Rizvi, Sanam Shahla; Chung, Tae-Sun
2010-01-01
Flash memory has become a more widespread storage medium for modern wireless devices because of its effective characteristics like non-volatility, small size, light weight, fast access speed, shock resistance, high reliability and low power consumption. Sensor nodes are highly resource constrained in terms of limited processing speed, runtime memory, persistent storage, communication bandwidth and finite energy. Therefore, for wireless sensor networks supporting sense, store, merge and send schemes, an efficient and reliable file system is highly required with consideration of sensor node constraints. In this paper, we propose a novel log structured external NAND flash memory based file system, called Proceeding to Intelligent service oriented memorY Allocation for flash based data centric Sensor devices in wireless sensor networks (PIYAS). This is the extended version of our previously proposed PIYA [1]. The main goals of the PIYAS scheme are to achieve instant mounting and reduced SRAM space by keeping memory mapping information to a very low size of and to provide high query response throughput by allocation of memory to the sensor data by network business rules. The scheme intelligently samples and stores the raw data and provides high in-network data availability by keeping the aggregate data for a longer period of time than any other scheme has done before. We propose effective garbage collection and wear-leveling schemes as well. The experimental results show that PIYAS is an optimized memory management scheme allowing high performance for wireless sensor networks.
Experimental high-speed network
NASA Astrophysics Data System (ADS)
McNeill, Kevin M.; Klein, William P.; Vercillo, Richard; Alsafadi, Yasser H.; Parra, Miguel V.; Dallas, William J.
1993-09-01
Many existing local area networking protocols currently applied in medical imaging were originally designed for relatively low-speed, low-volume networking. These protocols utilize small packet sizes appropriate for text based communication. Local area networks of this type typically provide raw bandwidth under 125 MHz. These older network technologies are not optimized for the low delay, high data traffic environment of a totally digital radiology department. Some current implementations use point-to-point links when greater bandwidth is required. However, the use of point-to-point communications for a total digital radiology department network presents many disadvantages. This paper describes work on an experimental multi-access local area network called XFT. The work includes the protocol specification, and the design and implementation of network interface hardware and software. The protocol specifies the Physical and Data Link layers (OSI layers 1 & 2) for a fiber-optic based token ring providing a raw bandwidth of 500 MHz. The protocol design and implementation of the XFT interface hardware includes many features to optimize image transfer and provide flexibility for additional future enhancements which include: a modular hardware design supporting easy portability to a variety of host system buses, a versatile message buffer design providing 16 MB of memory, and the capability to extend the raw bandwidth of the network to 3.0 GHz.
MPEG-1 low-cost encoder solution
NASA Astrophysics Data System (ADS)
Grueger, Klaus; Schirrmeister, Frank; Filor, Lutz; von Reventlow, Christian; Schneider, Ulrich; Mueller, Gerriet; Sefzik, Nicolai; Fiedrich, Sven
1995-02-01
A solution for real-time compression of digital YCRCB video data to an MPEG-1 video data stream has been developed. As an additional option, motion JPEG and video telephone streams (H.261) can be generated. For MPEG-1, up to two bidirectional predicted images are supported. The required computational power for motion estimation and DCT/IDCT, memory size and memory bandwidth have been the main challenges. The design uses fast-page-mode memory accesses and requires only one single 80 ns EDO-DRAM with 256 X 16 organization for video encoding. This can be achieved only by using adequate access and coding strategies. The architecture consists of an input processing and filter unit, a memory interface, a motion estimation unit, a motion compensation unit, a DCT unit, a quantization control, a VLC unit and a bus interface. For using the available memory bandwidth by the processing tasks, a fixed schedule for memory accesses has been applied, that can be interrupted for asynchronous events. The motion estimation unit implements a highly sophisticated hierarchical search strategy based on block matching. The DCT unit uses a separated fast-DCT flowgraph realized by a switchable hardware unit for both DCT and IDCT operation. By appropriate multiplexing, only one multiplier is required for: DCT, quantization, inverse quantization, and IDCT. The VLC unit generates the video-stream up to the video sequence layer and is directly coupled with an intelligent bus-interface. Thus, the assembly of video, audio and system data can easily be performed by the host computer. Having a relatively low complexity and only small requirements for DRAM circuits, the developed solution can be applied to low-cost encoding products for consumer electronics.
The bandwidth of consolidation into visual short-term memory (VSTM) depends on the visual feature
Miller, James R.; Becker, Mark W.; Liu, Taosheng
2014-01-01
We investigated the nature of the bandwidth limit in the consolidation of visual information into visual short-term memory. In the first two experiments, we examined whether previous results showing differential consolidation bandwidth for color and orientation resulted from methodological differences by testing the consolidation of color information with methods used in prior orientation experiments. We briefly presented two color patches with masks, either sequentially or simultaneously, followed by a location cue indicating the target. Participants identified the target color via button-press (Experiment 1) or by clicking a location on a color wheel (Experiment 2). Although these methods have previously demonstrated that two orientations are consolidated in a strictly serial fashion, here we found equivalent performance in the sequential and simultaneous conditions, suggesting that two colors can be consolidated in parallel. To investigate whether this difference resulted from different consolidation mechanisms or a common mechanism with different features consuming different amounts of bandwidth, Experiment 3 presented a color patch and an oriented grating either sequentially or simultaneously. We found a lower performance in the simultaneous than the sequential condition, with orientation showing a larger impairment than color. These results suggest that consolidation of both features share common mechanisms. However, it seems that color requires less information to be encoded than orientation. As a result two colors can be consolidated in parallel without exceeding the bandwidth limit, whereas two orientations or an orientation and a color exceed the bandwidth and appear to be consolidated serially. PMID:25317065
Multipulse addressing of a Raman quantum memory: configurable beam splitting and efficient readout.
Reim, K F; Nunn, J; Jin, X-M; Michelberger, P S; Champion, T F M; England, D G; Lee, K C; Kolthammer, W S; Langford, N K; Walmsley, I A
2012-06-29
Quantum memories are vital to the scalability of photonic quantum information processing (PQIP), since the storage of photons enables repeat-until-success strategies. On the other hand, the key element of all PQIP architectures is the beam splitter, which allows us to coherently couple optical modes. Here, we show how to combine these crucial functionalities by addressing a Raman quantum memory with multiple control pulses. The result is a coherent optical storage device with an extremely large time bandwidth product, that functions as an array of dynamically configurable beam splitters, and that can be read out with arbitrarily high efficiency. Networks of such devices would allow fully scalable PQIP, with applications in quantum computation, long distance quantum communications and quantum metrology.
Runtime support for parallelizing data mining algorithms
NASA Astrophysics Data System (ADS)
Jin, Ruoming; Agrawal, Gagan
2002-03-01
With recent technological advances, shared memory parallel machines have become more scalable, and offer large main memories and high bus bandwidths. They are emerging as good platforms for data warehousing and data mining. In this paper, we focus on shared memory parallelization of data mining algorithms. We have developed a series of techniques for parallelization of data mining algorithms, including full replication, full locking, fixed locking, optimized full locking, and cache-sensitive locking. Unlike previous work on shared memory parallelization of specific data mining algorithms, all of our techniques apply to a large number of common data mining algorithms. In addition, we propose a reduction-object based interface for specifying a data mining algorithm. We show how our runtime system can apply any of the technique we have developed starting from a common specification of the algorithm.
High-Speed On-Board Data Processing Platform for LIDAR Projects at NASA Langley Research Center
NASA Astrophysics Data System (ADS)
Beyon, J.; Ng, T. K.; Davis, M. J.; Adams, J. K.; Lin, B.
2015-12-01
The project called High-Speed On-Board Data Processing for Science Instruments (HOPS) has been funded by NASA Earth Science Technology Office (ESTO) Advanced Information Systems Technology (AIST) program during April, 2012 - April, 2015. HOPS is an enabler for science missions with extremely high data processing rates. In this three-year effort of HOPS, Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS) and 3-D Winds were of interest in particular. As for ASCENDS, HOPS replaces time domain data processing with frequency domain processing while making the real-time on-board data processing possible. As for 3-D Winds, HOPS offers real-time high-resolution wind profiling with 4,096-point fast Fourier transform (FFT). HOPS is adaptable with quick turn-around time. Since HOPS offers reusable user-friendly computational elements, its FPGA IP Core can be modified for a shorter development period if the algorithm changes. The FPGA and memory bandwidth of HOPS is 20 GB/sec while the typical maximum processor-to-SDRAM bandwidth of the commercial radiation tolerant high-end processors is about 130-150 MB/sec. The inter-board communication bandwidth of HOPS is 4 GB/sec while the effective processor-to-cPCI bandwidth of commercial radiation tolerant high-end boards is about 50-75 MB/sec. Also, HOPS offers VHDL cores for the easy and efficient implementation of ASCENDS and 3-D Winds, and other similar algorithms. A general overview of the 3-year development of HOPS is the goal of this presentation.
High-Speed On-Board Data Processing for Science Instruments: HOPS
NASA Technical Reports Server (NTRS)
Beyon, Jeffrey
2015-01-01
The project called High-Speed On-Board Data Processing for Science Instruments (HOPS) has been funded by NASA Earth Science Technology Office (ESTO) Advanced Information Systems Technology (AIST) program during April, 2012 â€" April, 2015. HOPS is an enabler for science missions with extremely high data processing rates. In this three-year effort of HOPS, Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS) and 3-D Winds were of interest in particular. As for ASCENDS, HOPS replaces time domain data processing with frequency domain processing while making the real-time on-board data processing possible. As for 3-D Winds, HOPS offers real-time high-resolution wind profiling with 4,096-point fast Fourier transform (FFT). HOPS is adaptable with quick turn-around time. Since HOPS offers reusable user-friendly computational elements, its FPGA IP Core can be modified for a shorter development period if the algorithm changes. The FPGA and memory bandwidth of HOPS is 20 GB/sec while the typical maximum processor-to-SDRAM bandwidth of the commercial radiation tolerant high-end processors is about 130-150 MB/sec. The inter-board communication bandwidth of HOPS is 4 GB/sec while the effective processor-to-cPCI bandwidth of commercial radiation tolerant high-end boards is about 50-75 MB/sec. Also, HOPS offers VHDL cores for the easy and efficient implementation of ASCENDS and 3-D Winds, and other similar algorithms. A general overview of the 3-year development of HOPS is the goal of this presentation.
High-bandwidth prefetcher for high-bandwidth memory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mehta, Sanyam; Kohn, James Robert; Ernst, Daniel Jonathan
A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer ("ORB"). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetchingmore » pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.« less
Quantum frequency conversion with ultra-broadband tuning in a Raman memory
NASA Astrophysics Data System (ADS)
Bustard, Philip J.; England, Duncan G.; Heshami, Khabat; Kupchak, Connor; Sussman, Benjamin J.
2017-05-01
Quantum frequency conversion is a powerful tool for the construction of hybrid quantum photonic technologies. Raman quantum memories are a promising method of conversion due to their broad bandwidths. Here we demonstrate frequency conversion of THz-bandwidth, fs-duration photons at the single-photon level using a Raman quantum memory based on the rotational levels of hydrogen molecules. We shift photons from 765 nm to wavelengths spanning from 673 to 590 nm—an absolute shift of up to 116 THz. We measure total conversion efficiencies of up to 10% and a maximum signal-to-noise ratio of 4.0(1):1, giving an expected conditional fidelity of 0.75, which exceeds the classical threshold of 2/3. Thermal noise could be eliminated by cooling with liquid nitrogen, giving noiseless conversion with wide tunability in the visible and infrared.
Epidemic failure detection and consensus for extreme parallelism
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas; ...
2017-02-01
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
NASA Astrophysics Data System (ADS)
Schrage, J.; Soenmez, Y.; Happel, T.; Gubler, U.; Lukowicz, P.; Mrozynski, G.
2006-02-01
From long haul, metro access and intersystem links the trend goes to applying optical interconnection technology at increasingly shorter distances. Intrasystem interconnects such as data busses between microprocessors and memory blocks are still based on copper interconnects today. This causes a bottleneck in computer systems since the achievable bandwidth of electrical interconnects is limited through the underlying physical properties. Approaches to solve this problem by embedding optical multimode polymer waveguides into the board (electro-optical circuit board technology, EOCB) have been reported earlier. The principle feasibility of optical interconnection technology in chip-to-chip applications has been validated in a number of projects. For reasons of cost considerations waveguides with large cross sections are used in order to relax alignment requirements and to allow automatic placement and assembly without any active alignment of components necessary. On the other hand the bandwidth of these highly multimodal waveguides is restricted due to mode dispersion. The advance of WDM technology towards intrasystem applications will provide sufficiently high bandwidth which is required for future high-performance computer systems: Assuming that, for example, 8 wavelength-channels with 12Gbps (SDR1) each are given, then optical on-board interconnects with data rates a magnitude higher than the data rates of electrical interconnects for distances typically found at today's computer boards and backplanes can be realized. The data rate will be twice as much, if DDR2 technology is considered towards the optical signals as well. In this paper we discuss an approach for a hybrid integrated optoelectronic WDM package which might enable the application of WDM technology to EOCB.
High-Density, High-Bandwidth, Multilevel Holographic Memory
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin
2008-01-01
A proposed holographic memory system would be capable of storing data at unprecedentedly high density, and its data transfer performance in both reading and writing would be characterized by exceptionally high bandwidth. The capabilities of the proposed system would greatly exceed even those of a state-of-the art memory system, based on binary holograms (in which each pixel value represents 0 or 1), that can hold .1 terabyte of data and can support a reading or writing rate as high as 1 Gb/s. The storage capacity of the state-of-theart system cannot be increased without also increasing the volume and mass of the system. However, in principle, the storage capacity could be increased greatly, without significantly increasing the volume and mass, if multilevel holograms were used instead of binary holograms. For example, a 3-bit (8-level) hologram could store 8 terabytes, or an 8-bit (256-level) hologram could store 256 terabytes, in a system having little or no more size and mass than does the state-of-the-art 1-terabyte binary holographic memory. The proposed system would utilize multilevel holograms. The system would include lasers, imaging lenses and other beam-forming optics, a block photorefractive crystal wherein the holograms would be formed, and two multilevel spatial light modulators in the form of commercially available deformable-mirror-device spatial light modulators (DMDSLMs) made for use in high speed input conversion of data up to 12 bits. For readout, the system would also include two arrays of complementary metal oxide/semiconductor (CMOS) photodetectors matching the spatial light modulators. The system would further include a reference-beam sterring device (equivalent of a scanning mirror), containing no sliding parts, that could be either a liquid-crystal phased-array device or a microscopic mirror actuated by a high-speed microelectromechanical system. Time-multiplexing and the multilevel nature of the DMDSLM would be exploited to enable writing and reading of multilevel holograms. The DMDSLM would also enable transfer of data at a rate of 7.6 Gb/s or perhaps somewhat higher.
Compression of CCD raw images for digital still cameras
NASA Astrophysics Data System (ADS)
Sriram, Parthasarathy; Sudharsanan, Subramania
2005-03-01
Lossless compression of raw CCD images captured using color filter arrays has several benefits. The benefits include improved storage capacity, reduced memory bandwidth, and lower power consumption for digital still camera processors. The paper discusses the benefits in detail and proposes the use of a computationally efficient block adaptive scheme for lossless compression. Experimental results are provided that indicate that the scheme performs well for CCD raw images attaining compression factors of more than two. The block adaptive method also compares favorably with JPEG-LS. A discussion is provided indicating how the proposed lossless coding scheme can be incorporated into digital still camera processors enabling lower memory bandwidth and storage requirements.
Towards High Resolution Numerical Algorithms for Wave Dominated Physical Phenomena
2009-01-30
results are scaled as floating point operations per second, obtained by counting the number of floating point additions and multiplications in the...black horizontal line. Perhaps the most striking feature at first is the fact that the memory bandwidth measured for flux lifting transcends this...theoretical peak performance values. For a suitable CPU-limited workload, this means that a single workstation equipped with multiple GPUs can do work that
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Zheming; Yoshii, Kazutomo; Finkel, Hal
Open Computing Language (OpenCL) is a high-level language that enables software programmers to explore Field Programmable Gate Arrays (FPGAs) for application acceleration. The Intel FPGA software development kit (SDK) for OpenCL allows a user to specify applications at a high level and explore the performance of low-level hardware acceleration. In this report, we present the FPGA performance and power consumption results of the single-precision floating-point vector add OpenCL kernel using the Intel FPGA SDK for OpenCL on the Nallatech 385A FPGA board. The board features an Arria 10 FPGA. We evaluate the FPGA implementations using the compute unit duplication andmore » kernel vectorization optimization techniques. On the Nallatech 385A FPGA board, the maximum compute kernel bandwidth we achieve is 25.8 GB/s, approximately 76% of the peak memory bandwidth. The power consumption of the FPGA device when running the kernels ranges from 29W to 42W.« less
NASA Astrophysics Data System (ADS)
Zhou, Gan; An, Xin; Pu, Allen; Psaltis, Demetri; Mok, Fai H.
1999-11-01
The holographic disc is a high capacity, disk-based data storage device that can provide the performance for next generation mass data storage needs. With a projected capacity approaching 1 terabit on a single 12 cm platter, the holographic disc has the potential to become a highly efficient storage hardware for data warehousing applications. The high readout rate of holographic disc makes it especially suitable for generating multiple, high bandwidth data streams such as required for network server computers. Multimedia applications such as interactive video and HDTV can also potentially benefit from the high capacity and fast data access of holographic memory.
Adaptive packet switch with an optical core (demonstrator)
NASA Astrophysics Data System (ADS)
Abdo, Ahmad; Bishtein, Vadim; Clark, Stewart A.; Dicorato, Pino; Lu, David T.; Paredes, Sofia A.; Taebi, Sareh; Hall, Trevor J.
2004-11-01
A three-stage opto-electronic packet switch architecture is described consisting of a reconfigurable optical centre stage surrounded by two electronic buffering stages partitioned into sectors to ease memory contention. A Flexible Bandwidth Provision (FBP) algorithm, implemented on a soft-core processor, is used to change the configuration of the input sectors and optical centre stage to set up internal paths that will provide variable bandwidth to serve the traffic. The switch is modeled by a bipartite graph built from a service matrix, which is a function of the arriving traffic. The bipartite graph is decomposed by solving an edge-colouring problem and the resulting permutations are used to configure the switch. Simulation results show that this architecture exhibits a dramatic reduction of complexity and increased potential for scalability, at the price of only a modest spatial speed-up k, 1
FPGA-based prototype storage system with phase change memory
NASA Astrophysics Data System (ADS)
Li, Gezi; Chen, Xiaogang; Chen, Bomy; Li, Shunfen; Zhou, Mi; Han, Wenbing; Song, Zhitang
2016-10-01
With the ever-increasing amount of data being stored via social media, mobile telephony base stations, and network devices etc. the database systems face severe bandwidth bottlenecks when moving vast amounts of data from storage to the processing nodes. At the same time, Storage Class Memory (SCM) technologies such as Phase Change Memory (PCM) with unique features like fast read access, high density, non-volatility, byte-addressability, positive response to increasing temperature, superior scalability, and zero standby leakage have changed the landscape of modern computing and storage systems. In such a scenario, we present a storage system called FLEET which can off-load partial or whole SQL queries to the storage engine from CPU. FLEET uses an FPGA rather than conventional CPUs to implement the off-load engine due to its highly parallel nature. We have implemented an initial prototype of FLEET with PCM-based storage. The results demonstrate that significant performance and CPU utilization gains can be achieved by pushing selected query processing components inside in PCM-based storage.
High-speed zero-copy data transfer for DAQ applications
NASA Astrophysics Data System (ADS)
Pisani, Flavio; Cámpora Pérez, Daniel Hugo; Neufeld, Niko
2015-05-01
The LHCb Data Acquisition (DAQ) will be upgraded in 2020 to a trigger-free readout. In order to achieve this goal we will need to connect around 500 nodes with a total network capacity of 32 Tb/s. To get such an high network capacity we are testing zero-copy technology in order to maximize the theoretical link throughput without adding excessive CPU and memory bandwidth overhead, leaving free resources for data processing resulting in less power, space and money used for the same result. We develop a modular test application which can be used with different transport layers. For the zero-copy implementation we choose the OFED IBVerbs API because it can provide low level access and high throughput. We present throughput and CPU usage measurements of 40 GbE solutions using Remote Direct Memory Access (RDMA), for several network configurations to test the scalability of the system.
Announcing Supercomputer Summit
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wells, Jack; Bland, Buddy; Nichols, Jeff
Summit is the next leap in leadership-class computing systems for open science. With Summit we will be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe. Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 3,400 nodes when it arrives in 2017. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink. Each node will have over half a terabyte ofmore » coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. Upon completion, Summit will allow researchers in all fields of science unprecedented access to solving some of the world’s most pressing challenges.« less
Enhanced compressed sensing for visual target tracking in wireless visual sensor networks
NASA Astrophysics Data System (ADS)
Qiang, Guo
2017-11-01
Moving object tracking in wireless sensor networks (WSNs) has been widely applied in various fields. Designing low-power WSNs for the limited resources of the sensor, such as energy limitation, energy restriction, and bandwidth constraints, is of high priority. However, most existing works focus on only single conflicting optimization criteria. An efficient compressive sensing technique based on a customized memory gradient pursuit algorithm with early termination in WSNs is presented, which strikes compelling trade-offs among energy dissipation for wireless transmission, certain types of bandwidth, and minimum storage. Then, the proposed approach adopts an unscented particle filter to predict the location of the target. The experimental results with a theoretical analysis demonstrate the substantially superior effectiveness of the proposed model and framework in regard to the energy and speed under the resource limitation of a visual sensor node.
Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.
Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad; Shi, Lin; Li, Keqin
2015-01-01
Multiple sequence alignment (MSA) constitutes an extremely powerful tool for many biological applications including phylogenetic tree estimation, secondary structure prediction, and critical residue identification. However, aligning large biological sequences with popular tools such as MAFFT requires long runtimes on sequential architectures. Due to the ever increasing sizes of sequence databases, there is increasing demand to accelerate this task. In this paper, we demonstrate how graphic processing units (GPUs), powered by the compute unified device architecture (CUDA), can be used as an efficient computational platform to accelerate the MAFFT algorithm. To fully exploit the GPU's capabilities for accelerating MAFFT, we have optimized the sequence data organization to eliminate the bandwidth bottleneck of memory access, designed a memory allocation and reuse strategy to make full use of limited memory of GPUs, proposed a new modified-run-length encoding (MRLE) scheme to reduce memory consumption, and used high-performance shared memory to speed up I/O operations. Our implementation tested in three NVIDIA GPUs achieves speedup up to 11.28 on a Tesla K20m GPU compared to the sequential MAFFT 7.015.
A review on shape memory alloys with applications to morphing aircraft
NASA Astrophysics Data System (ADS)
Barbarino, S.; Saavedra Flores, E. I.; Ajaj, R. M.; Dayyani, I.; Friswell, M. I.
2014-06-01
Shape memory alloys (SMAs) are a unique class of metallic materials with the ability to recover their original shape at certain characteristic temperatures (shape memory effect), even under high applied loads and large inelastic deformations, or to undergo large strains without plastic deformation or failure (super-elasticity). In this review, we describe the main features of SMAs, their constitutive models and their properties. We also review the fatigue behavior of SMAs and some methods adopted to remove or reduce its undesirable effects. SMAs have been used in a wide variety of applications in different fields. In this review, we focus on the use of shape memory alloys in the context of morphing aircraft, with particular emphasis on variable twist and camber, and also on actuation bandwidth and reduction of power consumption. These applications prove particularly challenging because novel configurations are adopted to maximize integration and effectiveness of SMAs, which play the role of an actuator (using the shape memory effect), often combined with structural, load-carrying capabilities. Iterative and multi-disciplinary modeling is therefore necessary due to the fluid-structure interaction combined with the nonlinear behavior of SMAs.
RXIO: Design and implementation of high performance RDMA-capable GridFTP
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tian, Yuan; Yu, Weikuan; Vetter, Jeffrey S.
2011-12-21
For its low-latency, high bandwidth, and low CPU utilization, Remote Direct Memory Access (RDMA) has established itself as an effective data movement technology in many networking environments. However, the transport protocols of grid run-time systems, such as GridFTP in Globus, are not yet capable of utilizing RDMA. In this study, we examine the architecture of GridFTP for the feasibility of enabling RDMA. An RDMA-capable XIO (RXIO) framework is designed and implemented to extend its XIO system and match the characteristics of RDMA. Our experimental results demonstrate that RDMA can significantly improve the performance of GridFTP, reducing the latency by 32%more » and increasing the bandwidth by more than three times. In achieving such performance improvements, RDMA dramatically cuts down CPU utilization of GridFTP clients and servers. In conclusion, these results demonstrate that RXIO can effectively exploit the benefits of RDMA for GridFTP. It offers a good prototype to further leverage GridFTP on wide-area RDMA networks.« less
Storage of RF photons in minimal conditions
NASA Astrophysics Data System (ADS)
Cromières, J.-P.; Chanelière, T.
2018-02-01
We investigate the minimal conditions to store coherently a RF pulse in a material medium. We choose a commercial quartz as a memory support because it is a widely available component with a high Q-factor. Pulse storage is obtained by varying dynamically the light-matter coupling with an analog switch. This parametric driving of the quartz dynamics can be alternatively interpreted as a stopped-light experiment. We obtain an efficiency of 26%, a storage time of 209 μs and a time-to-bandwidth product of 98 by optimizing the pulse temporal shape. The coherent character of the storage is demonstrated. Our goal is to connect different types of memories in the RF and optical domain for quantum information processing. Our motivation is essentially fundamental.
Holographic memory for high-density data storage and high-speed pattern recognition
NASA Astrophysics Data System (ADS)
Gu, Claire
2002-09-01
As computers and the internet become faster and faster, more and more information is transmitted, received, and stored everyday. The demand for high density and fast access time data storage is pushing scientists and engineers to explore all possible approaches including magnetic, mechanical, optical, etc. Optical data storage has already demonstrated its potential in the competition against other storage technologies. CD and DVD are showing their advantages in the computer and entertainment market. What motivated the use of optical waves to store and access information is the same as the motivation for optical communication. Light or an optical wave has an enormous capacity (or bandwidth) to carry information because of its short wavelength and parallel nature. In optical storage, there are two types of mechanism, namely localized and holographic memories. What gives the holographic data storage an advantage over localized bit storage is the natural ability to read the stored information in parallel, therefore, meeting the demand for fast access. Another unique feature that makes the holographic data storage attractive is that it is capable of performing associative recall at an incomparable speed. Therefore, volume holographic memory is particularly suitable for high-density data storage and high-speed pattern recognition. In this paper, we review previous works on volume holographic memories and discuss the challenges for this technology to become a reality.
Challenges of Future High-End Computing
NASA Technical Reports Server (NTRS)
Bailey, David; Kutler, Paul (Technical Monitor)
1998-01-01
The next major milestone in high performance computing is a sustained rate of one Pflop/s (also written one petaflops, or 10(circumflex)15 floating-point operations per second). In addition to prodigiously high computational performance, such systems must of necessity feature very large main memories, as well as comparably high I/O bandwidth and huge mass storage facilities. The current consensus of scientists who have studied these issues is that "affordable" petaflops systems may be feasible by the year 2010, assuming that certain key technologies continue to progress at current rates. One important question is whether applications can be structured to perform efficiently on such systems, which are expected to incorporate many thousands of processors and deeply hierarchical memory systems. To answer these questions, advanced performance modeling techniques, including simulation of future architectures and applications, may be required. It may also be necessary to formulate "latency tolerant algorithms" and other completely new algorithmic approaches for certain applications. This talk will give an overview of these challenges.
Dynamic storage in resource-scarce browsing multimedia applications
NASA Astrophysics Data System (ADS)
Elenbaas, Herman; Dimitrova, Nevenka
1998-10-01
In the convergence of information and entertainment there is a conflict between the consumer's expectation of fast access to high quality multimedia content through narrow bandwidth channels versus the size of this content. During the retrieval and information presentation of a multimedia application there are two problems that have to be solved: the limited bandwidth during transmission of the retrieved multimedia content and the limited memory for temporary caching. In this paper we propose an approach for latency optimization in information browsing applications. We proposed a method for flattening hierarchically linked documents in a manner convenient for network transport over slow channels to minimize browsing latency. Flattening of the hierarchy involves linearization, compression and bundling of the document nodes. After the transfer, the compressed hierarchy is stored on a local device where it can be partly unbundled to fit the caching limits at the local site while giving the user availability to the content.
Memory-assisted quantum key distribution resilient against multiple-excitation effects
NASA Astrophysics Data System (ADS)
Lo Piparo, Nicolò; Sinclair, Neil; Razavi, Mohsen
2018-01-01
Memory-assisted measurement-device-independent quantum key distribution (MA-MDI-QKD) has recently been proposed as a technique to improve the rate-versus-distance behavior of QKD systems by using existing, or nearly-achievable, quantum technologies. The promise is that MA-MDI-QKD would require less demanding quantum memories than the ones needed for probabilistic quantum repeaters. Nevertheless, early investigations suggest that, in order to beat the conventional memory-less QKD schemes, the quantum memories used in the MA-MDI-QKD protocols must have high bandwidth-storage products and short interaction times. Among different types of quantum memories, ensemble-based memories offer some of the required specifications, but they typically suffer from multiple excitation effects. To avoid the latter issue, in this paper, we propose two new variants of MA-MDI-QKD both relying on single-photon sources for entangling purposes. One is based on known techniques for entanglement distribution in quantum repeaters. This scheme turns out to offer no advantage even if one uses ideal single-photon sources. By finding the root cause of the problem, we then propose another setup, which can outperform single memory-less setups even if we allow for some imperfections in our single-photon sources. For such a scheme, we compare the key rate for different types of ensemble-based memories and show that certain classes of atomic ensembles can improve the rate-versus-distance behavior.
NASA Astrophysics Data System (ADS)
Atkins, M. Stella; Hwang, Robert; Tang, Simon
2001-05-01
We have implemented a prototype system consisting of a Java- based image viewer and a web server extension component for transmitting Magnetic Resonance Images (MRI) to an image viewer, to test the performance of different image retrieval techniques. We used full-resolution images, and images compressed/decompressed using the Set Partitioning in Hierarchical Trees (SPIHT) image compression algorithm. We examined the SPIHT decompression algorithm using both non- progressive and progressive transmission, focusing on the running times of the algorithm, client memory usage and garbage collection. We also compared the Java implementation with a native C++ implementation of the non- progressive SPIHT decompression variant. Our performance measurements showed that for uncompressed image retrieval using a 10Mbps Ethernet, a film of 16 MR images can be retrieved and displayed almost within interactive times. The native C++ code implementation of the client-side decoder is twice as fast as the Java decoder. If the network bandwidth is low, the high communication time for retrieving uncompressed images may be reduced by use of SPIHT-compressed images, although the image quality is then degraded. To provide diagnostic quality images, we also investigated the retrieval of up to 3 images on a MR film at full-resolution, using progressive SPIHT decompression. The Java-based implementation of progressive decompression performed badly, mainly due to the memory requirements for maintaining the image states, and the high cost of execution of the Java garbage collector. Hence, in systems where the bandwidth is high, such as found in a hospital intranet, SPIHT image compression does not provide advantages for image retrieval performance.
Ultrabright, narrow-band photon-pair source for atomic quantum memories
NASA Astrophysics Data System (ADS)
Tsai, Pin-Ju; Chen, Ying-Cheng
2018-06-01
We demonstrate an ultrabright, narrow-band and frequency-tunable photon-pair source based on cavity-enhanced spontaneous parametric down conversion (SPDC) which is compatible with atomic transition of rubidium D 2-line (780 nm) or cesium D 2-line (852 nm). With the pump beam alternating between a high and a low power phase, the output is switching between the optical parametric oscillator (OPO) and photon-pair generation mode. We utilize the OPO output light to lock the cavity length to maintain the double resonances of signal and idler, as well as to lock the signal frequency to cesium atomic transition. With a type-II phase matching and a double-passed pump scheme such that the cluster frequency spacing is larger than the SPDC bandwidth, the photon-pair output is in a nearly single-mode operation as confirmed by a scanning Fabry–Perot interferometer with its output detected by a photomultiplier. The achieved generation and detection rates are 7.24× {10}5 and 6142 s‑1 mW‑1, respectively. The correlation time of the photon pair is 21.6(2.2) ns, corresponding to a bandwidth of 2π × 6.6(6) MHz. The spectral brightness is 1.06× {10}5 s‑1 mW‑1 MHz‑1. This is a relatively high value under a single-mode operation with the cavity-SPDC scheme. The generated single photons can be readily used in experiments related to atomic quantum memories.
NASA Astrophysics Data System (ADS)
Jenkins, David R.; Basden, Alastair; Myers, Richard M.
2018-05-01
We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Assessment of EEG Signal Quality in Motion Environments
2009-06-01
of ATC and Charlotte Bernard of the U.S. Army Research Laboratory. We dedicate this paper to the memory of Patrick Nunez of the U.S. Army Tank...delta bandwidth). Therefore, signals related to cognitive processes such as attention and working memory that are related to these frequencies...M.; Monteagudo, M. J. Wertheim’s Hypothesis on ‘Highway Hypnosis ’: Empirical Evidence From a Study on Motorway and Conventional Road Driving
Coherent optical pulse sequencer for quantum applications.
Hosseini, Mahdi; Sparkes, Ben M; Hétet, Gabriel; Longdell, Jevon J; Lam, Ping Koy; Buchler, Ben C
2009-09-10
The bandwidth and versatility of optical devices have revolutionized information technology systems and communication networks. Precise and arbitrary control of an optical field that preserves optical coherence is an important requisite for many proposed photonic technologies. For quantum information applications, a device that allows storage and on-demand retrieval of arbitrary quantum states of light would form an ideal quantum optical memory. Recently, significant progress has been made in implementing atomic quantum memories using electromagnetically induced transparency, photon echo spectroscopy, off-resonance Raman spectroscopy and other atom-light interaction processes. Single-photon and bright-optical-field storage with quantum states have both been successfully demonstrated. Here we present a coherent optical memory based on photon echoes induced through controlled reversible inhomogeneous broadening. Our scheme allows storage of multiple pulses of light within a chosen frequency bandwidth, and stored pulses can be recalled in arbitrary order with any chosen delay between each recalled pulse. Furthermore, pulses can be time-compressed, time-stretched or split into multiple smaller pulses and recalled in several pieces at chosen times. Although our experimental results are so far limited to classical light pulses, our technique should enable the construction of an optical random-access memory for time-bin quantum information, and have potential applications in quantum information processing.
Announcing Supercomputer Summit
Wells, Jack; Bland, Buddy; Nichols, Jeff; Hack, Jim; Foertter, Fernanda; Hagen, Gaute; Maier, Thomas; Ashfaq, Moetasim; Messer, Bronson; Parete-Koon, Suzanne
2018-01-16
Summit is the next leap in leadership-class computing systems for open science. With Summit we will be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe. Summit will deliver more than five times the computational performance of Titanâs 18,688 nodes, using only approximately 3,400 nodes when it arrives in 2017. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIAâs high-speed NVLink. Each node will have over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect. Upon completion, Summit will allow researchers in all fields of science unprecedented access to solving some of the worldâs most pressing challenges.
Interfacing a high performance disk array file server to a Gigabit LAN
NASA Technical Reports Server (NTRS)
Seshan, Srinivasan; Katz, Randy H.
1993-01-01
Our previous prototype, RAID-1, identified several bottlenecks in typical file server architectures. The most important bottleneck was the lack of a high-bandwidth path between disk, memory, and the network. Workstation servers, such as the Sun-4/280, have very slow access to peripherals on busses far from the CPU. For the RAID-2 system, we addressed this problem by designing a crossbar interconnect, Xbus board, that provides a 40MB/s path between disk, memory, and the network interfaces. However, this interconnect does not provide the system CPU with low latency access to control the various interfaces. To provide a high data rate to clients on the network, we were forced to carefully and efficiently design the network software. A block diagram of the system hardware architecture is given. In the following subsections, we describe pieces of the RAID-2 file server hardware that had a significant impact on the design of the network interface.
PIMS: Memristor-Based Processing-in-Memory-and-Storage.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cook, Jeanine
Continued progress in computing has augmented the quest for higher performance with a new quest for higher energy efficiency. This has led to the re-emergence of Processing-In-Memory (PIM) ar- chitectures that offer higher density and performance with some boost in energy efficiency. Past PIM work either integrated a standard CPU with a conventional DRAM to improve the CPU- memory link, or used a bit-level processor with Single Instruction Multiple Data (SIMD) control, but neither matched the energy consumption of the memory to the computation. We originally proposed to develop a new architecture derived from PIM that more effectively addressed energymore » efficiency for high performance scientific, data analytics, and neuromorphic applications. We also originally planned to implement a von Neumann architecture with arithmetic/logic units (ALUs) that matched the power consumption of an advanced storage array to maximize energy efficiency. Implementing this architecture in storage was our original idea, since by augmenting storage (in- stead of memory), the system could address both in-memory computation and applications that accessed larger data sets directly from storage, hence Processing-in-Memory-and-Storage (PIMS). However, as our research matured, we discovered several things that changed our original direc- tion, the most important being that a PIM that implements a standard von Neumann-type archi- tecture results in significant energy efficiency improvement, but only about a O(10) performance improvement. In addition to this, the emergence of new memory technologies moved us to propos- ing a non-von Neumann architecture, called Superstrider, implemented not in storage, but in a new DRAM technology called High Bandwidth Memory (HBM). HBM is a stacked DRAM tech- nology that includes a logic layer where an architecture such as Superstrider could potentially be implemented.« less
NASA Astrophysics Data System (ADS)
Okamoto, Taro; Takenaka, Hiroshi; Nakamura, Takeshi; Aoki, Takayuki
2010-12-01
We adopted the GPU (graphics processing unit) to accelerate the large-scale finite-difference simulation of seismic wave propagation. The simulation can benefit from the high-memory bandwidth of GPU because it is a "memory intensive" problem. In a single-GPU case we achieved a performance of about 56 GFlops, which was about 45-fold faster than that achieved by a single core of the host central processing unit (CPU). We confirmed that the optimized use of fast shared memory and registers were essential for performance. In the multi-GPU case with three-dimensional domain decomposition, the non-contiguous memory alignment in the ghost zones was found to impose quite long time in data transfer between GPU and the host node. This problem was solved by using contiguous memory buffers for ghost zones. We achieved a performance of about 2.2 TFlops by using 120 GPUs and 330 GB of total memory: nearly (or more than) 2200 cores of host CPUs would be required to achieve the same performance. The weak scaling was nearly proportional to the number of GPUs. We therefore conclude that GPU computing for large-scale simulation of seismic wave propagation is a promising approach as a faster simulation is possible with reduced computational resources compared to CPUs.
Video Bandwidth Compression System.
1980-08-01
scaling function, located between the inverse DPCM and inverse transform , on the decoder matrix multiplier chips. 1"V1 T.. ---- i.13 SECURITY...Bit Unpacker and Inverse DPCM Slave Sync Board 15 e. Inverse DPCM Loop Boards 15 f. Inverse Transform Board 16 g. Composite Video Output Board 16...36 a. Display Refresh Memory 36 (1) Memory Section 37 (2) Timing and Control 39 b. Bit Unpacker and Inverse DPCM 40 c. Inverse Transform Processor 43
2014-08-01
searchrequired for SPH are described in Sect. 3. Section 4 contains aperformance analysis of the algorithm using Kepler -type GPUcards. 2. Numerical...generation of Kepler architecture, code nameGK104, which is also implemented in Tesla K10. The Keplerarchitecture relies on a Graphics Processing Cluster (GPC...lat-ter is 512 KB large and has a bandwidth of 512 B/clockcycle. Constant memory (read only per grid): 48 KB per Kepler SM.Used to hold constants
Fractional Steps methods for transient problems on commodity computer architectures
NASA Astrophysics Data System (ADS)
Krotkiewski, M.; Dabrowski, M.; Podladchikov, Y. Y.
2008-12-01
Fractional Steps methods are suitable for modeling transient processes that are central to many geological applications. Low memory requirements and modest computational complexity facilitates calculations on high-resolution three-dimensional models. An efficient implementation of Alternating Direction Implicit/Locally One-Dimensional schemes for an Opteron-based shared memory system is presented. The memory bandwidth usage, the main bottleneck on modern computer architectures, is specially addressed. High efficiency of above 2 GFlops per CPU is sustained for problems of 1 billion degrees of freedom. The optimized sequential implementation of all 1D sweeps is comparable in execution time to copying the used data in the memory. Scalability of the parallel implementation on up to 8 CPUs is close to perfect. Performing one timestep of the Locally One-Dimensional scheme on a system of 1000 3 unknowns on 8 CPUs takes only 11 s. We validate the LOD scheme using a computational model of an isolated inclusion subject to a constant far field flux. Next, we study numerically the evolution of a diffusion front and the effective thermal conductivity of composites consisting of multiple inclusions and compare the results with predictions based on the differential effective medium approach. Finally, application of the developed parabolic solver is suggested for a real-world problem of fluid transport and reactions inside a reservoir.
A Next-Generation Parallel File System Environment for the OLCF
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dillow, David A; Fuller, Douglas; Gunasekaran, Raghul
2012-01-01
When deployed in 2008/2009 the Spider system at the Oak Ridge National Laboratory s Leadership Computing Facility (OLCF) was the world s largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF s diverse computational environment, Spider has since become a blueprint for shared Lustre environments deployed worldwide. Designed to support the parallel I/O requirements of the Jaguar XT5 system and other smallerscale platforms at the OLCF, the upgrade to the Titan XK6 heterogeneous system will begin to push the limits of Spider s originalmore » design by mid 2013. With a doubling in total system memory and a 10x increase in FLOPS, Titan will require both higher bandwidth and larger total capacity. Our goal is to provide a 4x increase in total I/O bandwidth from over 240GB=sec today to 1TB=sec and a doubling in total capacity. While aggregate bandwidth and total capacity remain important capabilities, an equally important goal in our efforts is dramatically increasing metadata performance, currently the Achilles heel of parallel file systems at leadership. We present in this paper an analysis of our current I/O workloads, our operational experiences with the Spider parallel file systems, the high-level design of our Spider upgrade, and our efforts in developing benchmarks that synthesize our performance requirements based on our workload characterization studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brandt, James M.; Devine, Karen Dragon; Gentile, Ann C.
2014-09-01
As computer systems grow in both size and complexity, the need for applications and run-time systems to adjust to their dynamic environment also grows. The goal of the RAAMP LDRD was to combine static architecture information and real-time system state with algorithms to conserve power, reduce communication costs, and avoid network contention. We devel- oped new data collection and aggregation tools to extract static hardware information (e.g., node/core hierarchy, network routing) as well as real-time performance data (e.g., CPU uti- lization, power consumption, memory bandwidth saturation, percentage of used bandwidth, number of network stalls). We created application interfaces that allowedmore » this data to be used easily by algorithms. Finally, we demonstrated the benefit of integrating system and application information for two use cases. The first used real-time power consumption and memory bandwidth saturation data to throttle concurrency to save power without increasing application execution time. The second used static or real-time network traffic information to reduce or avoid network congestion by remapping MPI tasks to allocated processors. Results from our work are summarized in this report; more details are available in our publications [2, 6, 14, 16, 22, 29, 38, 44, 51, 54].« less
A wide bandwidth CCD buffer memory system
NASA Technical Reports Server (NTRS)
Siemens, K.; Wallace, R. W.; Robinson, C. R.
1978-01-01
A prototype system was implemented to demonstrate that CCD's can be applied advantageously to the problem of low power digital storage and particularly to the problem of interfacing widely varying data rates. CCD shift register memories (8K bit) were used to construct a feasibility model 128 K-bit buffer memory system. Serial data that can have rates between 150 kHz and 4.0 MHz can be stored in 4K-bit, randomly-accessible memory blocks. Peak power dissipation during a data transfer is less than 7 W, while idle power is approximately 5.4 W. The system features automatic data input synchronization with the recirculating CCD memory block start address. System expansion to accommodate parallel inputs or a greater number of memory blocks can be performed in a modular fashion. Since the control logic does not increase proportionally to increase in memory capacity, the power requirements per bit of storage can be reduced significantly in a larger system.
Visual dot interaction with short-term memory.
Etindele Sosso, Faustin Armel
2017-06-01
Many neurodegenerative diseases have a memory component. Brain structures related to memory are affected by environmental stimuli, and it is difficult to dissociate effects of all behavior of neurons. Here, visual cortex of mice was stimulated with gratings and dot, and an observation of neuronal activity before and after was made. Bandwidth, firing rate and orientation selectivity index were evaluated. A primary communication between primary visual cortex and short-term memory appeared to show an interesting path to train cognitive circuitry and investigate the basics mechanisms of the neuronal learning. The findings also suggested the interplay between primary visual cortex and short-term plasticity. The properties inside a visual target shape the perception and affect the basic encoding. Using visual cortex, it may be possible to train the memory and improve the recovery of people with cognitive disabilities or memory deficit.
Interfacing broadband photonic qubits to on-chip cavity-protected rare-earth ensembles
Zhong, Tian; Kindem, Jonathan M.; Rochman, Jake; Faraon, Andrei
2017-01-01
Ensembles of solid-state optical emitters enable broadband quantum storage and transduction of photonic qubits, with applications in high-rate quantum networks for secure communications and interconnecting future quantum computers. To transfer quantum states using ensembles, rephasing techniques are used to mitigate fast decoherence resulting from inhomogeneous broadening, but these techniques generally limit the bandwidth, efficiency and active times of the quantum interface. Here, we use a dense ensemble of neodymium rare-earth ions strongly coupled to a nanophotonic resonator to demonstrate a significant cavity protection effect at the single-photon level—a technique to suppress ensemble decoherence due to inhomogeneous broadening. The protected Rabi oscillations between the cavity field and the atomic super-radiant state enable ultra-fast transfer of photonic frequency qubits to the ions (∼50 GHz bandwidth) followed by retrieval with 98.7% fidelity. With the prospect of coupling to other long-lived rare-earth spin states, this technique opens the possibilities for broadband, always-ready quantum memories and fast optical-to-microwave transducers. PMID:28090078
Interfacing broadband photonic qubits to on-chip cavity-protected rare-earth ensembles
NASA Astrophysics Data System (ADS)
Zhong, Tian; Kindem, Jonathan M.; Rochman, Jake; Faraon, Andrei
2017-01-01
Ensembles of solid-state optical emitters enable broadband quantum storage and transduction of photonic qubits, with applications in high-rate quantum networks for secure communications and interconnecting future quantum computers. To transfer quantum states using ensembles, rephasing techniques are used to mitigate fast decoherence resulting from inhomogeneous broadening, but these techniques generally limit the bandwidth, efficiency and active times of the quantum interface. Here, we use a dense ensemble of neodymium rare-earth ions strongly coupled to a nanophotonic resonator to demonstrate a significant cavity protection effect at the single-photon level--a technique to suppress ensemble decoherence due to inhomogeneous broadening. The protected Rabi oscillations between the cavity field and the atomic super-radiant state enable ultra-fast transfer of photonic frequency qubits to the ions (~50 GHz bandwidth) followed by retrieval with 98.7% fidelity. With the prospect of coupling to other long-lived rare-earth spin states, this technique opens the possibilities for broadband, always-ready quantum memories and fast optical-to-microwave transducers.
AUDITORY ASSOCIATIVE MEMORY AND REPRESENTATIONAL PLASTICITY IN THE PRIMARY AUDITORY CORTEX
Weinberger, Norman M.
2009-01-01
Historically, the primary auditory cortex has been largely ignored as a substrate of auditory memory, perhaps because studies of associative learning could not reveal the plasticity of receptive fields (RFs). The use of a unified experimental design, in which RFs are obtained before and after standard training (e.g., classical and instrumental conditioning) revealed associative representational plasticity, characterized by facilitation of responses to tonal conditioned stimuli (CSs) at the expense of other frequencies, producing CS-specific tuning shifts. Associative representational plasticity (ARP) possesses the major attributes of associative memory: it is highly specific, discriminative, rapidly acquired, consolidates over hours and days and can be retained indefinitely. The nucleus basalis cholinergic system is sufficient both for the induction of ARP and for the induction of specific auditory memory, including control of the amount of remembered acoustic details. Extant controversies regarding the form, function and neural substrates of ARP appear largely to reflect different assumptions, which are explicitly discussed. The view that the forms of plasticity are task-dependent is supported by ongoing studies in which auditory learning involves CS-specific decreases in threshold or bandwidth without affecting frequency tuning. Future research needs to focus on the factors that determine ARP and their functions in hearing and in auditory memory. PMID:17344002
McCreery, Ryan W; Stelmachowicz, Patricia G
2013-09-01
Understanding speech in acoustically degraded environments can place significant cognitive demands on school-age children who are developing the cognitive and linguistic skills needed to support this process. Previous studies suggest the speech understanding, word learning, and academic performance can be negatively impacted by background noise, but the effect of limited audibility on cognitive processes in children has not been directly studied. The aim of the present study was to evaluate the impact of limited audibility on speech understanding and working memory tasks in school-age children with normal hearing. Seventeen children with normal hearing between 6 and 12 years of age participated in the present study. Repetition of nonword consonant-vowel-consonant stimuli was measured under conditions with combinations of two different signal to noise ratios (SNRs; 3 and 9 dB) and two low-pass filter settings (3.2 and 5.6 kHz). Verbal processing time was calculated based on the time from the onset of the stimulus to the onset of the child's response. Monosyllabic word repetition and recall were also measured in conditions with a full bandwidth and 5.6 kHz low-pass cutoff. Nonword repetition scores decreased as audibility decreased. Verbal processing time increased as audibility decreased, consistent with predictions based on increased listening effort. Although monosyllabic word repetition did not vary between the full bandwidth and 5.6 kHz low-pass filter condition, recall was significantly poorer in the condition with limited bandwidth (low pass at 5.6 kHz). Age and expressive language scores predicted performance on word recall tasks, but did not predict nonword repetition accuracy or verbal processing time. Decreased audibility was associated with reduced accuracy for nonword repetition and increased verbal processing time in children with normal hearing. Deficits in free recall were observed even under conditions where word repetition was not affected. The negative effects of reduced audibility may occur even under conditions where speech repetition is not impacted. Limited stimulus audibility may result in greater cognitive effort for verbal rehearsal in working memory and may limit the availability of cognitive resources to allocate to working memory and other processes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murphy, Richard C.
2009-09-01
This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential ofmore » PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.« less
Gradient Echo Quantum Memory in Warm Atomic Vapor
Pinel, Olivier; Hosseini, Mahdi; Sparkes, Ben M.; Everett, Jesse L.; Higginbottom, Daniel; Campbell, Geoff T.; Lam, Ping Koy; Buchler, Ben C.
2013-01-01
Gradient echo memory (GEM) is a protocol for storing optical quantum states of light in atomic ensembles. The primary motivation for such a technology is that quantum key distribution (QKD), which uses Heisenberg uncertainty to guarantee security of cryptographic keys, is limited in transmission distance. The development of a quantum repeater is a possible path to extend QKD range, but a repeater will need a quantum memory. In our experiments we use a gas of rubidium 87 vapor that is contained in a warm gas cell. This makes the scheme particularly simple. It is also a highly versatile scheme that enables in-memory refinement of the stored state, such as frequency shifting and bandwidth manipulation. The basis of the GEM protocol is to absorb the light into an ensemble of atoms that has been prepared in a magnetic field gradient. The reversal of this gradient leads to rephasing of the atomic polarization and thus recall of the stored optical state. We will outline how we prepare the atoms and this gradient and also describe some of the pitfalls that need to be avoided, in particular four-wave mixing, which can give rise to optical gain. PMID:24300586
Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo
2018-02-01
Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
Gradient echo quantum memory in warm atomic vapor.
Pinel, Olivier; Hosseini, Mahdi; Sparkes, Ben M; Everett, Jesse L; Higginbottom, Daniel; Campbell, Geoff T; Lam, Ping Koy; Buchler, Ben C
2013-11-11
Gradient echo memory (GEM) is a protocol for storing optical quantum states of light in atomic ensembles. The primary motivation for such a technology is that quantum key distribution (QKD), which uses Heisenberg uncertainty to guarantee security of cryptographic keys, is limited in transmission distance. The development of a quantum repeater is a possible path to extend QKD range, but a repeater will need a quantum memory. In our experiments we use a gas of rubidium 87 vapor that is contained in a warm gas cell. This makes the scheme particularly simple. It is also a highly versatile scheme that enables in-memory refinement of the stored state, such as frequency shifting and bandwidth manipulation. The basis of the GEM protocol is to absorb the light into an ensemble of atoms that has been prepared in a magnetic field gradient. The reversal of this gradient leads to rephasing of the atomic polarization and thus recall of the stored optical state. We will outline how we prepare the atoms and this gradient and also describe some of the pitfalls that need to be avoided, in particular four-wave mixing, which can give rise to optical gain.
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.
Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei
2017-07-18
Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
Real-Time Data Processing in the muon system of the D0 detector.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neeti Parashar et al.
2001-07-03
This paper presents a real-time application of the 16-bit fixed point Digital Signal Processors (DSPs), in the Muon System of the D0 detector located at the Fermilab Tevatron, presently the world's highest-energy hadron collider. As part of the Upgrade for a run beginning in the year 2000, the system is required to process data at an input event rate of 10 KHz without incurring significant deadtime in readout. The ADSP21csp01 processor has high I/O bandwidth, single cycle instruction execution and fast task switching support to provide efficient multisignal processing. The processor's internal memory consists of 4K words of Program Memorymore » and 4K words of Data Memory. In addition there is an external memory of 32K words for general event buffering and 16K words of Dual port Memory for input data queuing. This DSP fulfills the requirement of the Muon subdetector systems for data readout. All error handling, buffering, formatting and transferring of the data to the various trigger levels of the data acquisition system is done in software. The algorithms developed for the system complete these tasks in about 20 {micro}s per event.« less
Novel memory architecture for video signal processor
NASA Astrophysics Data System (ADS)
Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei
1993-11-01
An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.
A multiplexed light-matter interface for fibre-based quantum networks
Saglamyurek, Erhan; Grimau Puigibert, Marcelli; Zhou, Qiang; Giner, Lambert; Marsili, Francesco; Verma, Varun B.; Woo Nam, Sae; Oesterling, Lee; Nippa, David; Oblak, Daniel; Tittel, Wolfgang
2016-01-01
Processing and distributing quantum information using photons through fibre-optic or free-space links are essential for building future quantum networks. The scalability needed for such networks can be achieved by employing photonic quantum states that are multiplexed into time and/or frequency, and light-matter interfaces that are able to store and process such states with large time-bandwidth product and multimode capacities. Despite important progress in developing such devices, the demonstration of these capabilities using non-classical light remains challenging. Here, employing the atomic frequency comb quantum memory protocol in a cryogenically cooled erbium-doped optical fibre, we report the quantum storage of heralded single photons at a telecom-wavelength (1.53 μm) with a time-bandwidth product approaching 800. Furthermore, we demonstrate frequency-multimode storage and memory-based spectral-temporal photon manipulation. Notably, our demonstrations rely on fully integrated quantum technologies operating at telecommunication wavelengths. With improved storage efficiency, our light-matter interface may become a useful tool in future quantum networks. PMID:27046076
Video multiple watermarking technique based on image interlacing using DWT.
Ibrahim, Mohamed M; Abdel Kader, Neamat S; Zorkany, M
2014-01-01
Digital watermarking is one of the important techniques to secure digital media files in the domains of data authentication and copyright protection. In the nonblind watermarking systems, the need of the original host file in the watermark recovery operation makes an overhead over the system resources, doubles memory capacity, and doubles communications bandwidth. In this paper, a robust video multiple watermarking technique is proposed to solve this problem. This technique is based on image interlacing. In this technique, three-level discrete wavelet transform (DWT) is used as a watermark embedding/extracting domain, Arnold transform is used as a watermark encryption/decryption method, and different types of media (gray image, color image, and video) are used as watermarks. The robustness of this technique is tested by applying different types of attacks such as: geometric, noising, format-compression, and image-processing attacks. The simulation results show the effectiveness and good performance of the proposed technique in saving system resources, memory capacity, and communications bandwidth.
A multiplexed light-matter interface for fibre-based quantum networks.
Saglamyurek, Erhan; Grimau Puigibert, Marcelli; Zhou, Qiang; Giner, Lambert; Marsili, Francesco; Verma, Varun B; Woo Nam, Sae; Oesterling, Lee; Nippa, David; Oblak, Daniel; Tittel, Wolfgang
2016-04-05
Processing and distributing quantum information using photons through fibre-optic or free-space links are essential for building future quantum networks. The scalability needed for such networks can be achieved by employing photonic quantum states that are multiplexed into time and/or frequency, and light-matter interfaces that are able to store and process such states with large time-bandwidth product and multimode capacities. Despite important progress in developing such devices, the demonstration of these capabilities using non-classical light remains challenging. Here, employing the atomic frequency comb quantum memory protocol in a cryogenically cooled erbium-doped optical fibre, we report the quantum storage of heralded single photons at a telecom-wavelength (1.53 μm) with a time-bandwidth product approaching 800. Furthermore, we demonstrate frequency-multimode storage and memory-based spectral-temporal photon manipulation. Notably, our demonstrations rely on fully integrated quantum technologies operating at telecommunication wavelengths. With improved storage efficiency, our light-matter interface may become a useful tool in future quantum networks.
Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools
Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy
2016-01-01
This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Gpu Implementation of a Viscous Flow Solver on Unstructured Grids
NASA Astrophysics Data System (ADS)
Xu, Tianhao; Chen, Long
2016-06-01
Graphics processing units have gained popularities in scientific computing over past several years due to their outstanding parallel computing capability. Computational fluid dynamics applications involve large amounts of calculations, therefore a latest GPU card is preferable of which the peak computing performance and memory bandwidth are much better than a contemporary high-end CPU. We herein focus on the detailed implementation of our GPU targeting Reynolds-averaged Navier-Stokes equations solver based on finite-volume method. The solver employs a vertex-centered scheme on unstructured grids for the sake of being capable of handling complex topologies. Multiple optimizations are carried out to improve the memory accessing performance and kernel utilization. Both steady and unsteady flow simulation cases are carried out using explicit Runge-Kutta scheme. The solver with GPU acceleration in this paper is demonstrated to have competitive advantages over the CPU targeting one.
Using a Cray Y-MP as an array processor for a RISC Workstation
NASA Technical Reports Server (NTRS)
Lamaster, Hugh; Rogallo, Sarah J.
1992-01-01
As microprocessors increase in power, the economics of centralized computing has changed dramatically. At the beginning of the 1980's, mainframes and super computers were often considered to be cost-effective machines for scalar computing. Today, microprocessor-based RISC (reduced-instruction-set computer) systems have displaced many uses of mainframes and supercomputers. Supercomputers are still cost competitive when processing jobs that require both large memory size and high memory bandwidth. One such application is array processing. Certain numerical operations are appropriate to use in a Remote Procedure Call (RPC)-based environment. Matrix multiplication is an example of an operation that can have a sufficient number of arithmetic operations to amortize the cost of an RPC call. An experiment which demonstrates that matrix multiplication can be executed remotely on a large system to speed the execution over that experienced on a workstation is described.
Implementing Molecular Dynamics on Hybrid High Performance Computers - Three-Body Potentials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Yamada, Masako
The use of coprocessors or accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power re- quirements. Hybrid high-performance computers, defined as machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. Although there has been extensive research into methods to efficiently use accelerators to improve the performance of molecular dynamics (MD) employing pairwise potential energy models, little is reported in the literature for models that includemore » many-body effects. 3-body terms are required for many popular potentials such as MEAM, Tersoff, REBO, AIREBO, Stillinger-Weber, Bond-Order Potentials, and others. Because the per-atom simulation times are much higher for models incorporating 3-body terms, there is a clear need for efficient algo- rithms usable on hybrid high performance computers. Here, we report a shared-memory force-decomposition for 3-body potentials that avoids memory conflicts to allow for a deterministic code with substantial performance improvements on hybrid machines. We describe modifications necessary for use in distributed memory MD codes and show results for the simulation of water with Stillinger-Weber on the hybrid Titan supercomputer. We compare performance of the 3-body model to the SPC/E water model when using accelerators. Finally, we demonstrate that our approach can attain a speedup of 5.1 with acceleration on Titan for production simulations to study water droplet freezing on a surface.« less
Optical interconnection networks for high-performance computing systems
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr; Bergman, Keren
2012-04-01
Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers.
Multicore Architecture-aware Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srinivasa, Avinash
Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a largemore » scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times.« less
NASA Astrophysics Data System (ADS)
Burnett, W.
2016-12-01
The Department of Defense's (DoD) High Performance Computing Modernization Program (HPCMP) provides high performance computing to address the most significant challenges in computational resources, software application support and nationwide research and engineering networks. Today, the HPCMP has a critical role in ensuring the National Earth System Prediction Capability (N-ESPC) achieves initial operational status in 2019. A 2015 study commissioned by the HPCMP found that N-ESPC computational requirements will exceed interconnect bandwidth capacity due to the additional load from data assimilation and passing connecting data between ensemble codes. Memory bandwidth and I/O bandwidth will continue to be significant bottlenecks for the Navy's Hybrid Coordinate Ocean Model (HYCOM) scalability - by far the major driver of computing resource requirements in the N-ESPC. The study also found that few of the N-ESPC model developers have detailed plans to ensure their respective codes scale through 2024. Three HPCMP initiatives are designed to directly address and support these issues: Productivity Enhancement, Technology, Transfer and Training (PETTT), the HPCMP Applications Software Initiative (HASI), and Frontier Projects. PETTT supports code conversion by providing assistance, expertise and training in scalable and high-end computing architectures. HASI addresses the continuing need for modern application software that executes effectively and efficiently on next-generation high-performance computers. Frontier Projects enable research and development that could not be achieved using typical HPCMP resources by providing multi-disciplinary teams access to exceptional amounts of high performance computing resources. Finally, the Navy's DoD Supercomputing Resource Center (DSRC) currently operates a 6 Petabyte system, of which Naval Oceanography receives 15% of operational computational system use, or approximately 1 Petabyte of the processing capability. The DSRC will provide the DoD with future computing assets to initially operate the N-ESPC in 2019. This talk will further describe how DoD's HPCMP will ensure N-ESPC becomes operational, efficiently and effectively, using next-generation high performance computing.
Efficient Graph Based Assembly of Short-Read Sequences on Hybrid Core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sczyrba, Alex; Pratap, Abhishek; Canon, Shane
2011-03-22
Advanced architectures can deliver dramatically increased throughput for genomics and proteomics applications, reducing time-to-completion in some cases from days to minutes. One such architecture, hybrid-core computing, marries a traditional x86 environment with a reconfigurable coprocessor, based on field programmable gate array (FPGA) technology. In addition to higher throughput, increased performance can fundamentally improve research quality by allowing more accurate, previously impractical approaches. We will discuss the approach used by Convey?s de Bruijn graph constructor for short-read, de-novo assembly. Bioinformatics applications that have random access patterns to large memory spaces, such as graph-based algorithms, experience memory performance limitations on cache-based x86more » servers. Convey?s highly parallel memory subsystem allows application-specific logic to simultaneously access 8192 individual words in memory, significantly increasing effective memory bandwidth over cache-based memory systems. Many algorithms, such as Velvet and other de Bruijn graph based, short-read, de-novo assemblers, can greatly benefit from this type of memory architecture. Furthermore, small data type operations (four nucleotides can be represented in two bits) make more efficient use of logic gates than the data types dictated by conventional programming models.JGI is comparing the performance of Convey?s graph constructor and Velvet on both synthetic and real data. We will present preliminary results on memory usage and run time metrics for various data sets with different sizes, from small microbial and fungal genomes to very large cow rumen metagenome. For genomes with references we will also present assembly quality comparisons between the two assemblers.« less
Dragas, Jelena; Jäckel, David; Hierlemann, Andreas; Franke, Felix
2017-01-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989
Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix
2015-03-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction.
Evaluation Metrics for the Paragon XP/S-15
NASA Technical Reports Server (NTRS)
Traversat, Bernard; McNab, David; Nitzberg, Bill; Fineberg, Sam; Blaylock, Bruce T. (Technical Monitor)
1993-01-01
On February 17th 1993, the Numerical Aerodynamic Simulation (NAS) facility located at the NASA Ames Research Center installed a 224 node Intel Paragon XP/S-15 system. After its installation, the Paragon was found to be in a very immature state and was unable to support a NAS users' workload, composed of a wide range of development and production activities. As a first step towards addressing this problem, we implemented a set of metrics to objectively monitor the system as operating system and hardware upgrades were installed. The metrics were designed to measure four aspects of the system that we consider essential to support our workload: availability, utilization, functionality, and performance. This report presents the metrics collected from February 1993 to August 1993. Since its installation, the Paragon availability has improved from a low of 15% uptime to a high of 80%, while its utilization has remained low. Functionality and performance have improved from merely running one of the NAS Parallel Benchmarks to running all of them faster (between 1 and 2 times) than on the iPSC/860. In spite of the progress accomplished, fundamental limitations of the Paragon operating system are restricting the Paragon from supporting the NAS workload. The maximum operating system message passing (NORMA IPC) bandwidth was measured at 11 Mbytes/s, well below the peak hardware bandwidth (175 Mbytes/s), limiting overall virtual memory and Unix services (i.e. Disk and HiPPI I/O) performance. The high NX application message passing latency (184 microns), three times than on the iPSC/860, was found to significantly degrade performance of applications relying on small message sizes. The amount of memory available for an application was found to be approximately 10 Mbytes per node, indicating that the OS is taking more space than anticipated (6 Mbytes per node).
3D Integration for Wireless Multimedia
NASA Astrophysics Data System (ADS)
Kimmich, Georg
The convergence of mobile phone, internet, mapping, gaming and office automation tools with high quality video and still imaging capture capability is becoming a strong market trend for portable devices. High-density video encode and decode, 3D graphics for gaming, increased application-software complexity and ultra-high-bandwidth 4G modem technologies are driving the CPU performance and memory bandwidth requirements close to the PC segment. These portable multimedia devices are battery operated, which requires the deployment of new low-power-optimized silicon process technologies and ultra-low-power design techniques at system, architecture and device level. Mobile devices also need to comply with stringent silicon-area and package-volume constraints. As for all consumer devices, low production cost and fast time-to-volume production is key for success. This chapter shows how 3D architectures can bring a possible breakthrough to meet the conflicting power, performance and area constraints. Multiple 3D die-stacking partitioning strategies are described and analyzed on their potential to improve the overall system power, performance and cost for specific application scenarios. Requirements and maturity of the basic process-technology bricks including through-silicon via (TSV) and die-to-die attachment techniques are reviewed. Finally, we highlight new challenges which will arise with 3D stacking and an outlook on how they may be addressed: Higher power density will require thermal design considerations, new EDA tools will need to be developed to cope with the integration of heterogeneous technologies and to guarantee signal and power integrity across the die stack. The silicon/wafer test strategies have to be adapted to handle high-density IO arrays, ultra-thin wafers and provide built-in self-test of attached memories. New standards and business models have to be developed to allow cost-efficient assembly and testing of devices from different silicon and technology providers.
Generation, storage, and retrieval of nonclassical states of light using atomic ensembles
NASA Astrophysics Data System (ADS)
Eisaman, Matthew D.
This thesis presents the experimental demonstration of several novel methods for generating, storing, and retrieving nonclassical states of light using atomic ensembles, and describes applications of these methods to frequency-tunable single-photon generation, single-photon memory, quantum networks, and long-distance quantum communication. We first demonstrate emission of quantum-mechanically correlated pulses of light with a time delay between the pulses that is coherently controlled by utilizing 87Rb atoms. The experiment is based on Raman scattering, which produces correlated pairs of excited atoms and photons, followed by coherent conversion of the atomic states into a different photon field after a controllable delay. We then describe experiments demonstrating a novel approach for conditionally generating nonclassical pulses of light with controllable photon numbers, propagation direction, timing, and pulse shapes. We observe nonclassical correlations in relative photon number between correlated pairs of photons, and create few-photon light pulses with sub-Poissonian photon-number statistics via conditional detection on one field of the pair. Spatio-temporal control over the pulses is obtained by exploiting long-lived coherent memory for photon states and electromagnetically induced transparency (EIT) in an optically dense atomic medium. Finally, we demonstrate the use of EIT for the controllable generation, transmission, and storage of single photons with tunable frequency, timing, and bandwidth. To this end, we study the interaction of single photons produced in a "source" ensemble of 87Rb atoms at room temperature with another "target" ensemble. This allows us to simultaneously probe the spectral and quantum statistical properties of narrow-bandwidth single-photon pulses, revealing that their quantum nature is preserved under EIT propagation and storage. We measure the time delay associated with the reduced group velocity of the single-photon pulses and report observations of their storage and retrieval. Together these experiments utilize atomic ensembles to realize a narrow-bandwidth single-photon source, single-photon memory that preserves the quantum nature of the single photons, and a primitive quantum network comprised of two atomic-ensemble quantum memories connected by a single photon in an optical fiber. Each of these experimental demonstrations represents an essential element for the realization of long-distance quantum communication.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures
NASA Astrophysics Data System (ADS)
Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.
2016-12-01
The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allada, Veerendra, Benjegerdes, Troy; Bode, Brett
Commodity clusters augmented with application accelerators are evolving as competitive high performance computing systems. The Graphical Processing Unit (GPU) with a very high arithmetic density and performance per price ratio is a good platform for the scientific application acceleration. In addition to the interconnect bottlenecks among the cluster compute nodes, the cost of memory copies between the host and the GPU device have to be carefully amortized to improve the overall efficiency of the application. Scientific applications also rely on efficient implementation of the BAsic Linear Algebra Subroutines (BLAS), among which the General Matrix Multiply (GEMM) is considered as themore » workhorse subroutine. In this paper, they study the performance of the memory copies and GEMM subroutines that are critical to port the computational chemistry algorithms to the GPU clusters. To that end, a benchmark based on the NetPIPE framework is developed to evaluate the latency and bandwidth of the memory copies between the host and the GPU device. The performance of the single and double precision GEMM subroutines from the NVIDIA CUBLAS 2.0 library are studied. The results have been compared with that of the BLAS routines from the Intel Math Kernel Library (MKL) to understand the computational trade-offs. The test bed is a Intel Xeon cluster equipped with NVIDIA Tesla GPUs.« less
Exascale Hardware Architectures Working Group
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hemmert, S; Ang, J; Chiang, P
2011-03-15
The ASC Exascale Hardware Architecture working group is challenged to provide input on the following areas impacting the future use and usability of potential exascale computer systems: processor, memory, and interconnect architectures, as well as the power and resilience of these systems. Going forward, there are many challenging issues that will need to be addressed. First, power constraints in processor technologies will lead to steady increases in parallelism within a socket. Additionally, all cores may not be fully independent nor fully general purpose. Second, there is a clear trend toward less balanced machines, in terms of compute capability compared tomore » memory and interconnect performance. In order to mitigate the memory issues, memory technologies will introduce 3D stacking, eventually moving on-socket and likely on-die, providing greatly increased bandwidth but unfortunately also likely providing smaller memory capacity per core. Off-socket memory, possibly in the form of non-volatile memory, will create a complex memory hierarchy. Third, communication energy will dominate the energy required to compute, such that interconnect power and bandwidth will have a significant impact. All of the above changes are driven by the need for greatly increased energy efficiency, as current technology will prove unsuitable for exascale, due to unsustainable power requirements of such a system. These changes will have the most significant impact on programming models and algorithms, but they will be felt across all layers of the machine. There is clear need to engage all ASC working groups in planning for how to deal with technological changes of this magnitude. The primary function of the Hardware Architecture Working Group is to facilitate codesign with hardware vendors to ensure future exascale platforms are capable of efficiently supporting the ASC applications, which in turn need to meet the mission needs of the NNSA Stockpile Stewardship Program. This issue is relatively immediate, as there is only a small window of opportunity to influence hardware design for 2018 machines. Given the short timeline a firm co-design methodology with vendors is of prime importance.« less
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.
In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
NASA Astrophysics Data System (ADS)
Zinke, Stephan
2017-02-01
Memory sensitive applications for remote sensing data require memory-optimized data types in remote sensing products. Hierarchical Data Format version 5 (HDF5) offers user defined floating point numbers and integers and the n-bit filter to create data types optimized for memory consumption. The European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) applies a compaction scheme to the disseminated products of the Day and Night Band (DNB) data of Suomi National Polar-orbiting Partnership (S-NPP) satellite's instrument Visible Infrared Imager Radiometer Suite (VIIRS) through the EUMETSAT Advanced Retransmission Service, converting the original 32 bits floating point numbers to user defined floating point numbers in combination with the n-bit filter for the radiance dataset of the product. The radiance dataset requires a floating point representation due to the high dynamic range of the DNB. A compression factor of 1.96 is reached by using an automatically determined exponent size and an 8 bits trailing significand and thus reducing the bandwidth requirements for dissemination. It is shown how the parameters needed for user defined floating point numbers are derived or determined automatically based on the data present in a product.
RAID-2: Design and implementation of a large scale disk array controller
NASA Technical Reports Server (NTRS)
Katz, R. H.; Chen, P. M.; Drapeau, A. L.; Lee, E. K.; Lutz, K.; Miller, E. L.; Seshan, S.; Patterson, D. A.
1992-01-01
We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly.
A chip-integrated coherent photonic-phononic memory.
Merklein, Moritz; Stiller, Birgit; Vu, Khu; Madden, Stephen J; Eggleton, Benjamin J
2017-09-18
Controlling and manipulating quanta of coherent acoustic vibrations-phonons-in integrated circuits has recently drawn a lot of attention, since phonons can function as unique links between radiofrequency and optical signals, allow access to quantum regimes and offer advanced signal processing capabilities. Recent approaches based on optomechanical resonators have achieved impressive quality factors allowing for storage of optical signals. However, so far these techniques have been limited in bandwidth and are incompatible with multi-wavelength operation. In this work, we experimentally demonstrate a coherent buffer in an integrated planar optical waveguide by transferring the optical information coherently to an acoustic hypersound wave. Optical information is extracted using the reverse process. These hypersound phonons have similar wavelengths as the optical photons but travel at five orders of magnitude lower velocity. We demonstrate the storage of phase and amplitude of optical information with gigahertz bandwidth and show operation at separate wavelengths with negligible cross-talk.Optical storage implementations based on optomechanical resonator are limited to one wavelength. Here, exploiting stimulated Brillouin scattering, the authors demonstrate a coherent optical memory based on a planar integrated waveguide, which can operate at different wavelengths without cross-talk.
Design of a Telescopic Linear Actuator Based on Hollow Shape Memory Springs
NASA Astrophysics Data System (ADS)
Spaggiari, Andrea; Spinella, Igor; Dragoni, Eugenio
2011-07-01
Shape memory alloys (SMAs) are smart materials exploited in many applications to build actuators with high power to mass ratio. Typical SMA drawbacks are: wires show poor stroke and excessive length, helical springs have limited mechanical bandwidth and high power consumption. This study is focused on the design of a large-scale linear SMA actuator conceived to maximize the stroke while limiting the overall size and the electric consumption. This result is achieved by adopting for the actuator a telescopic multi-stage architecture and using SMA helical springs with hollow cross section to power the stages. The hollow geometry leads to reduced axial size and mass of the actuator and to enhanced working frequency while the telescopic design confers to the actuator an indexable motion, with a number of different displacements being achieved through simple on-off control strategies. An analytical thermo-electro-mechanical model is developed to optimize the device. Output stroke and force are maximized while total size and power consumption are simultaneously minimized. Finally, the optimized actuator, showing good performance from all these points of view, is designed in detail.
Scaling Irregular Applications through Data Aggregation and Software Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel
Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Knee implant imaging at 3 Tesla using high-bandwidth radiofrequency pulses.
Bachschmidt, Theresa J; Sutter, Reto; Jakob, Peter M; Pfirrmann, Christian W A; Nittka, Mathias
2015-06-01
To investigate the impact of high-bandwidth radiofrequency (RF) pulses used in turbo spin echo (TSE) sequences or combined with slice encoding for metal artifact correction (SEMAC) on artifact reduction at 3 Tesla in the knee in the presence of metal. Local transmit/receive coils feature increased maximum B1 amplitude, reduced SAR exposition and thus enable the application of high-bandwidth RF pulses. Susceptibility-induced through-plane distortion scales inversely with the RF bandwidth and the view angle, hence blurring, increases for higher RF bandwidths, when SEMAC is used. These effects were assessed for a phantom containing a total knee arthroplasty. TSE and SEMAC sequences with conventional and high RF bandwidths and different contrasts were tested on eight patients with different types of implants. To realize scan times of 7 to 9 min, SEMAC was always applied with eight slice-encoding steps and distortion was rated by two radiologists. A local transmit/receive knee coil enables the use of an RF bandwidth of 4 kHz compared with 850 Hz in conventional sequences. Phantom scans confirm the relation of RF bandwidth and through-plane distortion, which can be reduced up to 79%, and demonstrate the increased blurring for high-bandwidth RF pulses. In average, artifacts in this RF mode are rated hardly visible for patients with joint arthroplasties, when eight SEMAC slice-encoding steps are applied, and for patients with titanium fixtures, when TSE is used. The application of high-bandwidth RF pulses by local transmit coils substantially reduces through-plane distortion artifacts at 3 Tesla. © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Saponara, Sergio; Donati, Massimiliano; Fanucci, Luca; Odendahl, Maximilian; Leupers, Reiner; Errico, Walter
2013-02-01
The on-board data processing is a vital task for any satellite and spacecraft due to the importance of elaborate the sensing data before sending them to the Earth, in order to exploit effectively the bandwidth to the ground station. In the last years the amount of sensing data collected by scientific and commercial space missions has increased significantly, while the available downlink bandwidth is comparatively stable. The increasing demand of on-board real-time processing capabilities represents one of the critical issues in forthcoming European missions. Faster and faster signal and image processing algorithms are required to accomplish planetary observation, surveillance, Synthetic Aperture Radar imaging and telecommunications. The only available space-qualified Digital Signal Processor (DSP) free of International Traffic in Arms Regulations (ITAR) restrictions faces inadequate performance, thus the development of a next generation European DSP is well known to the space community. The DSPACE space-qualified DSP architecture fills the gap between the computational requirements and the available devices. It leverages a pipelined and massively parallel core based on the Very Long Instruction Word (VLIW) paradigm, with 64 registers and 8 operational units, along with cache memories, memory controllers and SpaceWire interfaces. Both the synthesizable VHDL and the software development tools are generated from the LISA high-level model. A Xilinx-XC7K325T FPGA is chosen to realize a compact PCI demonstrator board. Finally first synthesis results on CMOS standard cell technology (ASIC 180 nm) show an area of around 380 kgates and a peak performance of 1000 MIPS and 750 MFLOPS at 125MHz.
High bandwidth electro-optic technology for intersatellite optical communications
NASA Technical Reports Server (NTRS)
Krainak, Michael A.
1992-01-01
The research and development of electronic and electro-optic components for geosynchronous and low earth orbiting satellite optical high bandwidth communications at the NASA-Goddard Space Flight Center is reviewed. Intersatellite optical communications retains a strong reliance on microwave circuit technology in several areas - the microwave to optical interface, the laser transmitter modulation driver and the optical receiver. A microwave to optical interface is described requiring high bandwidth electronic downconverters and demodulators. Electrical bandwidth and current drive requirements for the laser modulation driver for three laser alternatives are discussed. Bandwidth and noise requirements are presented for optical receiver architectures.
SU-E-J-60: Efficient Monte Carlo Dose Calculation On CPU-GPU Heterogeneous Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiao, K; Chen, D. Z; Hu, X. S
Purpose: It is well-known that the performance of GPU-based Monte Carlo dose calculation implementations is bounded by memory bandwidth. One major cause of this bottleneck is the random memory writing patterns in dose deposition, which leads to several memory efficiency issues on GPU such as un-coalesced writing and atomic operations. We propose a new method to alleviate such issues on CPU-GPU heterogeneous systems, which achieves overall performance improvement for Monte Carlo dose calculation. Methods: Dose deposition is to accumulate dose into the voxels of a dose volume along the trajectories of radiation rays. Our idea is to partition this proceduremore » into the following three steps, which are fine-tuned for CPU or GPU: (1) each GPU thread writes dose results with location information to a buffer on GPU memory, which achieves fully-coalesced and atomic-free memory transactions; (2) the dose results in the buffer are transferred to CPU memory; (3) the dose volume is constructed from the dose buffer on CPU. We organize the processing of all radiation rays into streams. Since the steps within a stream use different hardware resources (i.e., GPU, DMA, CPU), we can overlap the execution of these steps for different streams by pipelining. Results: We evaluated our method using a Monte Carlo Convolution Superposition (MCCS) program and tested our implementation for various clinical cases on a heterogeneous system containing an Intel i7 quad-core CPU and an NVIDIA TITAN GPU. Comparing with a straightforward MCCS implementation on the same system (using both CPU and GPU for radiation ray tracing), our method gained 2-5X speedup without losing dose calculation accuracy. Conclusion: The results show that our new method improves the effective memory bandwidth and overall performance for MCCS on the CPU-GPU systems. Our proposed method can also be applied to accelerate other Monte Carlo dose calculation approaches. This research was supported in part by NSF under Grants CCF-1217906, and also in part by a research contract from the Sandia National Laboratories.« less
NASA Astrophysics Data System (ADS)
Wilby, W. A.; Brett, A. R. H.
Frequency set on techniques used in ECM applications include repeater jammers, frequency memory loops (RF and optical), coherent digital RF memories, and closed loop VCO set on systems. Closed loop frequency set on systems using analog phase and frequency locking are considered to have a number of cost and performance advantages. Their performance is discussed in terms of frequency accuracy, bandwidth, locking time, stability, and simultaneous signals. Some experimental results are presented which show typical locking performance. Future ECM systems might require a response to very short pulses. Acoustooptic and fiber-optic pulse stretching techniques can be used to meet such requirements.
Dynamically programmable cache
NASA Astrophysics Data System (ADS)
Nakkar, Mouna; Harding, John A.; Schwartz, David A.; Franzon, Paul D.; Conte, Thomas
1998-10-01
Reconfigurable machines have recently been used as co- processors to accelerate the execution of certain algorithms or program subroutines. The problems with the above approach include high reconfiguration time and limited partial reconfiguration. By far the most critical problems are: (1) the small on-chip memory which results in slower execution time, and (2) small FPGA areas that cannot implement large subroutines. Dynamically Programmable Cache (DPC) is a novel architecture for embedded processors which offers solutions to the above problems. To solve memory access problems, DPC processors merge reconfigurable arrays with the data cache at various cache levels to create a multi-level reconfigurable machines. As a result DPC machines have both higher data accessibility and FPGA memory bandwidth. To solve the limited FPGA resource problem, DPC processors implemented multi-context switching (Virtualization) concept. Virtualization allows implementation of large subroutines with fewer FPGA cells. Additionally, DPC processors can parallelize the execution of several operations resulting in faster execution time. In this paper, the speedup improvement for DPC machines are shown to be 5X faster than an Altera FLEX10K FPGA chip and 2X faster than a Sun Ultral SPARC station for two different algorithms (convolution and motion estimation).
Light-Stimulated Synaptic Devices Utilizing Interfacial Effect of Organic Field-Effect Transistors.
Dai, Shilei; Wu, Xiaohan; Liu, Dapeng; Chu, Yingli; Wang, Kai; Yang, Ben; Huang, Jia
2018-06-14
Synaptic transistors stimulated by light waves or photons may offer advantages to the devices, such as wide bandwidth, ultrafast signal transmission, and robustness. However, previously reported light-stimulated synaptic devices generally require special photoelectric properties from the semiconductors and sophisticated device's architectures. In this work, a simple and effective strategy for fabricating light-stimulated synaptic transistors is provided by utilizing interface charge trapping effect of organic field-effect transistors (OFETs). Significantly, our devices exhibited highly synapselike behaviors, such as excitatory postsynaptic current (EPSC) and pair-pulse facilitation (PPF), and presented memory and learning ability. The EPSC decay, PPF curves, and forgetting behavior can be well expressed by mathematical equations for synaptic devices, indicating that interfacial charge trapping effect of OFETs can be utilized as a reliable strategy to realize organic light-stimulated synapses. Therefore, this work provides a simple and effective strategy for fabricating light-stimulated synaptic transistors with both memory and learning ability, which enlightens a new direction for developing neuromorphic devices.
Germanium:gallium photoconductors for far infrared heterodyne detection
NASA Technical Reports Server (NTRS)
Park, I. S.; Haller, E. E.; Grossman, E. N.; Watson, Dan M.
1988-01-01
Highly compensated Ge:Ga photoconductors for high bandwidth heterodyne detection have been fabricated and evaluated. Bandwidths up to 60 MHz have been achieved with a corresponding current responsivity of 0.01 A/W. The expected dependence of bandwidth on bias field is obtained. It is noted that increased bandwidth is obtained at the price of greater required local oscillator power.
Tactical Decision Aids High Bandwidth Links Using Autonomous Vehicles
2004-01-01
1 Tactical Decision Aids (High Bandwidth Links Using Autonomous Vehicles ) A. J. Healey, D. P. Horner, Center for Autonomous Underwater Vehicle...SUBTITLE Tactical Decision Aids (High Bandwidth Links Using Autonomous Vehicles ) 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6
Multichannel heterodyning for wideband interferometry, correlation and signal processing
Erskine, David J.
1999-01-01
A method of signal processing a high bandwidth signal by coherently subdividing it into many narrow bandwidth channels which are individually processed at lower frequencies in a parallel manner. Autocorrelation and correlations can be performed using reference frequencies which may drift slowly with time, reducing cost of device. Coordinated adjustment of channel phases alters temporal and spectral behavior of net signal process more precisely than a channel used individually. This is a method of implementing precision long coherent delays, interferometers, and filters for high bandwidth optical or microwave signals using low bandwidth electronics. High bandwidth signals can be recorded, mathematically manipulated, and synthesized.
Multichannel heterodyning for wideband interferometry, correlation and signal processing
Erskine, D.J.
1999-08-24
A method is disclosed of signal processing a high bandwidth signal by coherently subdividing it into many narrow bandwidth channels which are individually processed at lower frequencies in a parallel manner. Autocorrelation and correlations can be performed using reference frequencies which may drift slowly with time, reducing cost of device. Coordinated adjustment of channel phases alters temporal and spectral behavior of net signal process more precisely than a channel used individually. This is a method of implementing precision long coherent delays, interferometers, and filters for high bandwidth optical or microwave signals using low bandwidth electronics. High bandwidth signals can be recorded, mathematically manipulated, and synthesized. 50 figs.
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Highly efficient on-chip direct electronic-plasmonic transducers
NASA Astrophysics Data System (ADS)
Du, Wei; Wang, Tao; Chu, Hong-Son; Nijhuis, Christian A.
2017-10-01
Photonic elements can carry information with a capacity exceeding 1,000 times that of electronic components, but, due to the optical diffraction limit, these elements are large and difficult to integrate with modern-day nanoelectronics or upcoming packages, such as three-dimensional integrated circuits or stacked high-bandwidth memories1-3. Surface plasmon polaritons can be confined to subwavelength dimensions and can carry information at high speeds (>100 THz)4-6. To combine the small dimensions of nanoelectronics with the fast operating speed of optics via plasmonics, on-chip electronic-plasmonic transducers that directly convert electrical signals into plasmonic signals (and vice versa) are required. Here, we report electronic-plasmonic transducers based on metal-insulator-metal tunnel junctions coupled to plasmonic waveguides with high-efficiency on-chip generation, manipulation and readout of plasmons. These junctions can be readily integrated into existing technologies, and we thus believe that they are promising for applications in on-chip integrated plasmonic circuits.
GPU-computing in econophysics and statistical physics
NASA Astrophysics Data System (ADS)
Preis, T.
2011-03-01
A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
PACE: Power-Aware Computing Engines
2005-02-01
more costly than compu- tation on our test platform, and it is memory access that dominates most lossless data compression algorithms . In fact, even...Performance and implementation concerns A compression algorithm may be implemented with many different, yet reasonable, data structures (including...Related work This section discusses data compression for low- bandwidth devices and optimizing algorithms for low energy. Though much work has gone
Li, Yajie; Zhao, Yongli; Zhang, Jie; Yu, Xiaosong; Jing, Ruiquan
2017-11-27
Network operators generally provide dedicated lightpaths for customers to meet the demand for high-quality transmission. Considering the variation of traffic load, customers usually rent peak bandwidth that exceeds the practical average traffic requirement. In this case, bandwidth provisioning is unmetered and customers have to pay according to peak bandwidth. Supposing that network operators could keep track of traffic load and allocate bandwidth dynamically, bandwidth can be provided as a metered service and customers would pay for the bandwidth that they actually use. To achieve cost-effective bandwidth provisioning, this paper proposes an autonomic bandwidth adjustment scheme based on data analysis of traffic load. The scheme is implemented in a software defined networking (SDN) controller and is demonstrated in the field trial of multi-vendor optical transport networks. The field trial shows that the proposed scheme can track traffic load and realize autonomic bandwidth adjustment. In addition, a simulation experiment is conducted to evaluate the performance of the proposed scheme. We also investigate the impact of different parameters on autonomic bandwidth adjustment. Simulation results show that the step size and adjustment period have significant influences on bandwidth savings and packet loss. A small value of step size and adjustment period can bring more benefits by tracking traffic variation with high accuracy. For network operators, the scheme can serve as technical support of realizing bandwidth as metered service in the future.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Saini, Subhash; Grassi, Charles
1994-01-01
The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Low-Power Architectures for Large Radio Astronomy Correlators
NASA Technical Reports Server (NTRS)
D'Addario, Larry R.
2011-01-01
The architecture of a cross-correlator for a synthesis radio telescope with N greater than 1000 antennas is studied with the objective of minimizing power consumption. It is found that the optimum architecture minimizes memory operations, and this implies preference for a matrix structure over a pipeline structure and avoiding the use of memory banks as accumulation registers when sharing multiply-accumulators among baselines. A straw-man design for N = 2000 and bandwidth of 1 GHz, based on ASICs fabricated in a 90 nm CMOS process, is presented. The cross-correlator proper (excluding per-antenna processing) is estimated to consume less than 35 kW.
Atom-Resonant Heralded Single Photons by Interaction-Free Measurement
NASA Astrophysics Data System (ADS)
Wolfgramm, Florian; de Icaza Astiz, Yannick A.; Beduini, Federica A.; Cerè, Alessandro; Mitchell, Morgan W.
2011-02-01
We demonstrate the generation of rubidium-resonant heralded single photons for quantum memories. Photon pairs are created by cavity-enhanced down-conversion and narrowed in bandwidth to 7 MHz with a novel atom-based filter operating by “interaction-free measurement” principles. At least 94% of the heralded photons are atom-resonant as demonstrated by a direct absorption measurement with rubidium vapor. A heralded autocorrelation measurement shows gc(2)(0)=0.040±0.012, i.e., suppression of multiphoton contributions by a factor of 25 relative to a coherent state. The generated heralded photons can readily be used in quantum memories and quantum networks.
Highly linear dual ring resonator modulator for wide bandwidth microwave photonic links.
Hosseinzadeh, Arash; Middlebrook, Christopher T
2016-11-28
A highly linear dual ring resonator modulator (DRRM) design is demonstrated to provide high spur-free dynamic range (SFDR) in a wide operational bandwidth. Harmonic and intermodulation distortions are theoretically analyzed in a single ring resonator modulator (RRM) with Lorentzian-shape transfer function and a strategy is proposed to enhance modulator linearity for wide bandwidth applications by utilizing DRRM. Third order intermodulation distortion is suppressed in a frequency independent process with proper splitting ratio of optical and RF power and proper dc biasing of the ring resonators. Operational bandwidth limits of the DRRM are compared to the RRM showing the capability of the DRRM in providing higher SFDR in an unlimited operational bandwidth. DRRM bandwidth limitations are a result of the modulation index from each RRM and their resonance characteristics that limit the gain and noise figure of the microwave photonic link. The impact of the modulator on microwave photonic link figure of merits is analyzed and compared to RRM and Mach-Zehnder Interference (MZI) modulators. Considering ± 5 GHz operational bandwidth around the resonance frequency imposed by the modulation index requirement the DRRM is capable of a ~15 dB SFDR improvement (1 Hz instantaneous bandwidth) versus RRM and MZI.
Improvement of multiprocessing performance by using optical centralized shared bus
NASA Astrophysics Data System (ADS)
Han, Xuliang; Chen, Ray T.
2004-06-01
With the ever-increasing need to solve larger and more complex problems, multiprocessing is attracting more and more research efforts. One of the challenges facing the multiprocessor designers is to fulfill in an effective manner the communications among the processes running in parallel on multiple multiprocessors. The conventional electrical backplane bus provides narrow bandwidth as restricted by the physical limitations of electrical interconnects. In the electrical domain, in order to operate at high frequency, the backplane topology has been changed from the simple shared bus to the complicated switched medium. However, the switched medium is an indirect network. It cannot support multicast/broadcast as effectively as the shared bus. Besides the additional latency of going through the intermediate switching nodes, signal routing introduces substantial delay and considerable system complexity. Alternatively, optics has been well known for its interconnect capability. Therefore, it has become imperative to investigate how to improve multiprocessing performance by utilizing optical interconnects. From the implementation standpoint, the existing optical technologies still cannot fulfill the intelligent functions that a switch fabric should provide as effectively as their electronic counterparts. Thus, an innovative optical technology that can provide sufficient bandwidth capacity, while at the same time, retaining the essential merits of the shared bus topology, is highly desirable for the multiprocessing performance improvement. In this paper, the optical centralized shared bus is proposed for use in the multiprocessing systems. This novel optical interconnect architecture not only utilizes the beneficial characteristics of optics, but also retains the desirable properties of the shared bus topology. Meanwhile, from the architecture standpoint, it fits well in the centralized shared-memory multiprocessing scheme. Therefore, a smooth migration with substantial multiprocessing performance improvement is expected. To prove the technical feasibility from the architecture standpoint, a conceptual emulation of the centralized shared-memory multiprocessing scheme is demonstrated on a generic PCI subsystem with an optical centralized shared bus.
NASA Technical Reports Server (NTRS)
Peach, Robert; Malarky, Alastair
1990-01-01
Currently proposed mobile satellite communications systems require a high degree of flexibility in assignment of spectral capacity to different geographic locations. Conventionally this results in poor spectral efficiency which may be overcome by the use of bandwidth switchable filtering. Surface acoustic wave (SAW) technology makes it possible to provide banks of filters whose responses may be contiguously combined to form variable bandwidth filters with constant amplitude and phase responses across the entire band. The high selectivity possible with SAW filters, combined with the variable bandwidth capability, makes it possible to achieve spectral efficiencies over the allocated bandwidths of greater than 90 percent, while retaining full system flexibility. Bandwidth switchable SAW filtering (BSSF) achieves these gains with a negligible increase in hardware complexity.
Chen, Sen; Luo, Sheng Nian
2018-03-01
Polychromatic X-ray sources can be useful for photon-starved small-angle X-ray scattering given their high spectral fluxes. Their bandwidths, however, are 10-100 times larger than those using monochromators. To explore the feasibility, ideal scattering curves of homogeneous spherical particles for polychromatic X-rays are calculated and analyzed using the Guinier approach, maximum entropy and regularization methods. Monodisperse and polydisperse systems are explored. The influence of bandwidth and asymmetric spectra shape are explored via Gaussian and half-Gaussian spectra. Synchrotron undulator spectra represented by two undulator sources of the Advanced Photon Source are examined as an example, as regards the influence of asymmetric harmonic shape, fundamental harmonic bandwidth and high harmonics. The effects of bandwidth, spectral shape and high harmonics on particle size determination are evaluated quantitatively.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Sen; Luo, Sheng-Nian
Polychromatic X-ray sources can be useful for photon-starved small-angle X-ray scattering given their high spectral fluxes. Their bandwidths, however, are 10–100 times larger than those using monochromators. To explore the feasibility, ideal scattering curves of homogeneous spherical particles for polychromatic X-rays are calculated and analyzed using the Guinier approach, maximum entropy and regularization methods. Monodisperse and polydisperse systems are explored. The influence of bandwidth and asymmetric spectra shape are exploredviaGaussian and half-Gaussian spectra. Synchrotron undulator spectra represented by two undulator sources of the Advanced Photon Source are examined as an example, as regards the influence of asymmetric harmonic shape, fundamentalmore » harmonic bandwidth and high harmonics. The effects of bandwidth, spectral shape and high harmonics on particle size determination are evaluated quantitatively.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Automatic Adaptation of Tunable Distributed Applications
2001-01-01
size, weight, and battery life, with a single CPU, less memory, smaller hard disk, and lower bandwidth network connectivity. The power of PDAs is...wireless, and bluetooth [32] facilities; thus achieving different rates of data transmission. 1 With the trend of “write once, run everywhere...applications, a single component can execute on multiple processors (or machines) in parallel. These parallel applications, written in a specialized language
Seeto, Angeline; Searchfield, Grant D
2018-03-01
Advances in digital signal processing have made it possible to provide a wide-band frequency response with smooth, precise spectral shaping. Several manufacturers have introduced hearing aids that are claimed to provide gain for frequencies up to 10-12 kHz. However, there is currently limited evidence and very few independent studies evaluating the performance of the extended bandwidth hearing aids that have recently become available. This study investigated an extended bandwidth hearing aid using measures of speech intelligibility and sound quality to find out whether there was a significant benefit of extended bandwidth amplification over standard amplification. Repeated measures study designed to examine the efficacy of extended bandwidth amplification compared to standard bandwidth amplification. Sixteen adult participants with mild-to-moderate sensorineural hearing loss. Participants were bilaterally fit with a pair of Widex Mind 440 behind-the-ear hearing aids programmed with a standard bandwidth fitting and an extended bandwidth fitting; the latter provided gain up to 10 kHz. For each fitting, and an unaided condition, participants completed two speech measures of aided benefit, the Quick Speech-in-Noise test (QuickSIN™) and the Phonak Phoneme Perception Test (PPT; high-frequency perception in quiet), and a measure of sound quality rating. There were no significant differences found between unaided and aided conditions for QuickSIN™ scores. For the PPT, there were statistically significantly lower (improved) detection thresholds at high frequencies (6 and 9 kHz) with the extended bandwidth fitting. Although not statistically significant, participants were able to distinguish between 6 and 9 kHz 50% better with extended bandwidth. No significant difference was found in ability to recognize phonemes in quiet between the unaided and aided conditions when phonemes only contained frequency content <6 kHz. However significant benefit was found with the extended bandwidth fitting for recognition of 9-kHz phonemes. No significant difference in sound quality preference was found between the standard bandwidth and extended bandwidth fittings. This study demonstrated that a pair of currently available extended bandwidth hearing aids was technically capable of delivering high-frequency amplification that was both audible and useable to listeners with mild-to-moderate hearing loss. This amplification was of acceptable sound quality. Further research, particularly field trials, is required to ascertain the real-world benefit of high-frequency amplification. American Academy of Audiology
The effects of long-term stress exposure on aging cognition: a behavioral and EEG investigation.
Marshall, Amanda C; Cooper, Nicholas R; Segrave, Rebecca; Geeraert, Nicolas
2015-06-01
A large field of research seeks to explore and understand the factors that may cause different rates of age-related cognitive decline within the general population. However, the impact of experienced stress on the human aging process has remained an under-researched possibility. This study explored the association between cumulative stressful experiences and cognitive aging, addressing whether higher levels of experienced stress correlate with impaired performance on 2 working memory tasks. Behavioral performance was paired with electroencephalographic recordings to enable insight into the underlying neural processes impacted on by cumulative stress. Thus, the electroencephalogram was recorded while both young and elderly performed 2 different working memory tasks (a Sternberg and N-back paradigm), and cortical oscillatory activity in the theta, alpha, and gamma bandwidths was measured. Behavioral data indicated that a higher stress score among elderly participants related to impaired performance on both tasks. Electrophysiological findings revealed a reduction in alpha and gamma event-related synchronization among high-stress-group elderly participants, indicating that higher levels of experienced stress may impact on their ability to actively maintain a stimulus in working memory and inhibit extraneous information interfering with successful maintenance. Findings provide evidence that cumulative experienced stress adversely affects cognitive aging. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Fuchs, Gregory
2011-03-01
Nitrogen vacancy (NV) center spins in diamond have emerged as a promising solid-state system for quantum information processing and precision metrology at room temperature. Understanding and developing the built-in resources of this defect center for quantum logic and memory is critical to achieving these goals. In the first case, we use nanosecond duration microwave manipulation to study the electronic spin of single NV centers in their orbital excited-state (ES). We demonstrate ES Rabi oscillations and use multi-pulse resonant control to differentiate between phonon-induced dephasing, orbital relaxation, and coherent electron-nuclear interactions. A second resource, the nuclear spin of the intrinsic nitrogen atom, may be an ideal candidate for a quantum memory due to both the long coherence of nuclear spins and their deterministic presence. We investigate coherent swaps between the NV center electronic spin state and the nuclear spin state of nitrogen using Landau-Zener transitions performed outside the asymptotic regime. The swap gates are generated using lithographically fabricated waveguides that form a high-bandwidth, two-axis vector magnet on the diamond substrate. These experiments provide tools for coherently manipulating and storing quantum information in a scalable solid-state system at room temperature. We gratefully acknowledge support from AFOSR, ARO, and DARPA.
Systems and Methods for Radar Data Communication
NASA Technical Reports Server (NTRS)
Bunch, Brian (Inventor); Szeto, Roland (Inventor); Miller, Brad (Inventor)
2013-01-01
A radar information processing system is operable to process high bandwidth radar information received from a radar system into low bandwidth radar information that may be communicated to a low bandwidth connection coupled to an electronic flight bag (EFB). An exemplary embodiment receives radar information from a radar system, the radar information communicated from the radar system at a first bandwidth; processes the received radar information into processed radar information, the processed radar information configured for communication over a connection operable at a second bandwidth, the second bandwidth lower than the first bandwidth; and communicates the radar information from a radar system, the radar information communicated from the radar system at a first bandwidth.
Observation of Brownian motion in liquids at short times: instantaneous velocity and memory loss.
Kheifets, Simon; Simha, Akarsh; Melin, Kevin; Li, Tongcang; Raizen, Mark G
2014-03-28
Measurement of the instantaneous velocity of Brownian motion of suspended particles in liquid probes the microscopic foundations of statistical mechanics in soft condensed matter. However, instantaneous velocity has eluded experimental observation for more than a century since Einstein's prediction of the small length and time scales involved. We report shot-noise-limited, high-bandwidth measurements of Brownian motion of micrometer-sized beads suspended in water and acetone by an optical tweezer. We observe the hydrodynamic instantaneous velocity of Brownian motion in a liquid, which follows a modified energy equipartition theorem that accounts for the kinetic energy of the fluid displaced by the moving bead. We also observe an anticorrelated thermal force, which is conventionally assumed to be uncorrelated.
Simple Atomic Quantum Memory Suitable for Semiconductor Quantum Dot Single Photons
NASA Astrophysics Data System (ADS)
Wolters, Janik; Buser, Gianni; Horsley, Andrew; Béguin, Lucas; Jöckel, Andreas; Jahn, Jan-Philipp; Warburton, Richard J.; Treutlein, Philipp
2017-08-01
Quantum memories matched to single photon sources will form an important cornerstone of future quantum network technology. We demonstrate such a memory in warm Rb vapor with on-demand storage and retrieval, based on electromagnetically induced transparency. With an acceptance bandwidth of δ f =0.66 GHz , the memory is suitable for single photons emitted by semiconductor quantum dots. In this regime, vapor cell memories offer an excellent compromise between storage efficiency, storage time, noise level, and experimental complexity, and atomic collisions have negligible influence on the optical coherences. Operation of the memory is demonstrated using attenuated laser pulses on the single photon level. For a 50 ns storage time, we measure ηe2 e 50 ns=3.4 (3 )% end-to-end efficiency of the fiber-coupled memory, with a total intrinsic efficiency ηint=17 (3 )%. Straightforward technological improvements can boost the end-to-end-efficiency to ηe 2 e≈35 %; beyond that, increasing the optical depth and exploiting the Zeeman substructure of the atoms will allow such a memory to approach near unity efficiency. In the present memory, the unconditional read-out noise level of 9 ×10-3 photons is dominated by atomic fluorescence, and for input pulses containing on average μ1=0.27 (4 ) photons, the signal to noise level would be unity.
Simple Atomic Quantum Memory Suitable for Semiconductor Quantum Dot Single Photons.
Wolters, Janik; Buser, Gianni; Horsley, Andrew; Béguin, Lucas; Jöckel, Andreas; Jahn, Jan-Philipp; Warburton, Richard J; Treutlein, Philipp
2017-08-11
Quantum memories matched to single photon sources will form an important cornerstone of future quantum network technology. We demonstrate such a memory in warm Rb vapor with on-demand storage and retrieval, based on electromagnetically induced transparency. With an acceptance bandwidth of δf=0.66 GHz, the memory is suitable for single photons emitted by semiconductor quantum dots. In this regime, vapor cell memories offer an excellent compromise between storage efficiency, storage time, noise level, and experimental complexity, and atomic collisions have negligible influence on the optical coherences. Operation of the memory is demonstrated using attenuated laser pulses on the single photon level. For a 50 ns storage time, we measure η_{e2e}^{50 ns}=3.4(3)% end-to-end efficiency of the fiber-coupled memory, with a total intrinsic efficiency η_{int}=17(3)%. Straightforward technological improvements can boost the end-to-end-efficiency to η_{e2e}≈35%; beyond that, increasing the optical depth and exploiting the Zeeman substructure of the atoms will allow such a memory to approach near unity efficiency. In the present memory, the unconditional read-out noise level of 9×10^{-3} photons is dominated by atomic fluorescence, and for input pulses containing on average μ_{1}=0.27(4) photons, the signal to noise level would be unity.
Field, Ella Suzanne; Bellum, John Curtis; Kletecka, Damon E.
2016-09-21
Broad bandwidth coatings allow angle of incidence flexibility and accommodate spectral shifts due to aging and water absorption. Higher refractive index materials in optical coatings, such as TiO 2, Nb 2O 5, and Ta 2O 5, can be used to achieve broader bandwidths compared to coatings that contain HfO 2 high index layers. We have identified the deposition settings that lead to the highest index, lowest absorption layers of TiO 2, Nb 2O 5, and Ta 2O 5, via e-beam evaporation using ion-assisted deposition. We paired these high index materials with SiO 2 as the low index material to createmore » broad bandwidth high reflection coatings centered at 1054 nm for 45 deg angle of incidence and P polarization. Furthermore, high reflection bandwidths as large as 231 nm were realized. Laser damage tests of these coatings using the ISO 11254 and NIF-MEL protocols are presented, which revealed that the Ta 2O 5/SiO 2 coating exhibits the highest resistance to laser damage, at the expense of lower bandwidth compared to the TiO 2/SiO 2 and Nb 2O 5/SiO 2 coatings.« less
Intelligent Memory Module Overcomes Harsh Environments
NASA Technical Reports Server (NTRS)
2008-01-01
Solar cells, integrated circuits, and sensors are essential to manned and unmanned space flight and exploration, but such systems are highly susceptible to damage from radiation. Especially problematic, the Van Allen radiation belts encircle Earth in concentric radioactive tori at distances from about 6,300 to 38,000 km, though the inner radiation belt can dip as low as 700 km, posing a severe hazard to craft and humans leaving Earth s atmosphere. To avoid this radiation, the International Space Station and space shuttles orbit at altitudes between 275 and 460 km, below the belts range, and Apollo astronauts skirted the edge of the belts to minimize exposure, passing swiftly through thinner sections of the belts and thereby avoiding significant side effects. This radiation can, however, prove detrimental to improperly protected electronics on satellites that spend the majority of their service life in the harsh environment of the belts. Compact, high-performance electronics that can withstand extreme environmental and radiation stress are thus critical to future space missions. Increasing miniaturization of electronics addresses the need for lighter weight in launch payloads, as launch costs put weight at a premium. Likewise, improved memory technologies have reduced size, cost, mass, power demand, and system complexity, and improved high-bandwidth communication to meet the data volume needs of the next-generation high-resolution sensors. This very miniaturization, however, has exacerbated system susceptibility to radiation, as the charge of ions may meet or exceed that of circuitry, overwhelming the circuit and disrupting operation of a satellite. The Hubble Space Telescope, for example, must turn off its sensors when passing through intense radiation to maintain reliable operation. To address the need for improved data quality, additional capacity for raw and processed data, ever-increasing resolution, and radiation tolerance, NASA spurred the development of the Radiation Tolerant Intelligent Memory Stack (RTIMS).
NASA Technical Reports Server (NTRS)
Nessel, James A.; Zaman, Afroz; Lee, Richard Q.; Lambert, Kevin
2005-01-01
The feasibility of obtaining large bandwidth and high directivity from a multilayer Yagi-like microstrip patch antenna at 10 GHz is investigated. A measured 10-dB bandwidth of approximately 20 percent and directivity of approximately 11 dBi is demonstrated through the implementation of a vertically-stacked structure with three parasitic directors, above the driven patch, and a single reflector underneath the driven patch. Simulated and measured results are compared and show fairly close agreement. This antenna offers the advantages of large bandwidth, high directivity, and symmetrical broadside patterns, and could be applicable to satellite as well as terrestrial communications.
Impact of crystal orientation on the modulation bandwidth of InGaN/GaN light-emitting diodes
NASA Astrophysics Data System (ADS)
Monavarian, M.; Rashidi, A.; Aragon, A. A.; Oh, S. H.; Rishinaramangalam, A. K.; DenBaars, S. P.; Feezell, D.
2018-01-01
High-speed InGaN/GaN blue light-emitting diodes (LEDs) are needed for future gigabit-per-second visible-light communication systems. Large LED modulation bandwidths are typically achieved at high current densities, with reports close to 1 GHz bandwidth at current densities ranging from 5 to 10 kA/cm2. However, the internal quantum efficiency (IQE) of InGaN/GaN LEDs is quite low at high current densities due to the well-known efficiency droop phenomenon. Here, we show experimentally that nonpolar and semipolar orientations of GaN enable higher modulation bandwidths at low current densities where the IQE is expected to be higher and power dissipation is lower. We experimentally compare the modulation bandwidth vs. current density for LEDs on nonpolar (10 1 ¯ 0 ), semipolar (20 2 ¯ 1 ¯) , and polar (" separators="|0001 ) orientations. In agreement with wavefunction overlap considerations, the experimental results indicate a higher modulation bandwidth for the nonpolar and semipolar LEDs, especially at relatively low current densities. At 500 A/cm2, the nonpolar LED has a 3 dB bandwidth of ˜1 GHz, while the semipolar and polar LEDs exhibit bandwidths of 260 MHz and 75 MHz, respectively. A lower carrier density for a given current density is extracted from the RF measurements for the nonpolar and semipolar LEDs, consistent with the higher wavefunction overlaps in these orientations. At large current densities, the bandwidth of the polar LED approaches that of the nonpolar and semipolar LEDs due to coulomb screening of the polarization field. The results support using nonpolar and semipolar orientations to achieve high-speed LEDs at low current densities.
Hardware accelerator of convolution with exponential function for image processing applications
NASA Astrophysics Data System (ADS)
Panchenko, Ivan; Bucha, Victor
2015-12-01
In this paper we describe a Hardware Accelerator (HWA) for fast recursive approximation of separable convolution with exponential function. This filter can be used in many Image Processing (IP) applications, e.g. depth-dependent image blur, image enhancement and disparity estimation. We have adopted this filter RTL implementation to provide maximum throughput in constrains of required memory bandwidth and hardware resources to provide a power-efficient VLSI implementation.
Chavez, Candice M.; McGaugh, James L.; Weinberger, Norman M.
2013-01-01
The basolateral amygdala (BLA) modulates memory, particularly for arousing or emotional events, during post-training periods of consolidation. It strengthens memories whose substrates in part or whole are stored remotely, in structures such as the hippocampus, striatum and cerebral cortex. However, the mechanisms by which the BLA influences distant memory traces are unknown, largely because of the need for identifiable target mnemonic representations. Associative tuning plasticity in the primary auditory cortex (A1) constitutes a well-characterized candidate specific memory substrate that is ubiquitous across species, tasks and motivational states. When tone predicts reinforcement, the tuning of cells in A1 shifts toward or to the signal frequency within its tonotopic map, producing an over-representation of behaviorally important sounds. Tuning shifts have the cardinal attributes of forms of memory, including associativity, specificity, rapid induction, consolidation and long-term retention and are therefore likely memory representations. We hypothesized that the BLA strengthens memories by increasing their cortical representations. We recorded multiple unit activity from A1 of rats that received a single discrimination training session in which two tones (2.0 s) separated by 1.25 octaves were either paired with brief electrical stimulation (400 ms) of the BLA (CS+) or not (CS−). Frequency response areas generated by presenting a matrix of test tones (0.5–53.82 kHz, 0–70 dB) were obtained before training and daily for three weeks post-training. Tuning both at threshold and above threshold shifted predominantly toward the CS+ beginning on Day 1. Tuning shifts were maintained for the entire three weeks. Absolute threshold and bandwidth decreased, producing less enduring increases in sensitivity and selectivity. BLA-induced tuning shifts were associative, highly specific and long-lasting. We propose that the BLA strengthens memory for important experiences by increasing the number of neurons that come to best represent that event. Traumatic, intrusive memories might reflect abnormally extensive representational networks due to hyper-activity of the BLA consequent to the release of excessive amounts of stress hormones. PMID:23266792
NASA Astrophysics Data System (ADS)
Zhang, Wei; Ding, Dong-Sheng; Shi, Shuai; Li, Yan; Zhou, Zhi-Yuan; Shi, Bao-Sen; Guo, Guang-Can
2016-02-01
Quantum memory is an essential building block for quantum communication and scalable linear quantum computation. Storing two-color entangled photons with one photon being at the telecommunication (telecom) wavelength while the other photon is compatible with quantum memory has great advantages toward the realization of the fiber-based long-distance quantum communication with the aid of quantum repeaters. Here, we report an experimental realization of storing a photon entangled with a telecom photon in polarization as an atomic spin wave in a cold atomic ensemble, thus establishing the entanglement between the telecom-band photon and the atomic-ensemble memory in a polarization degree of freedom. The reconstructed density matrix and the violation of the Clauser-Horne-Shimony-Holt inequality clearly show the preservation of quantum entanglement during storage. Our result is very promising for establishing a long-distance quantum network based on cold atomic ensembles.
Latest generation interconnect technologies in APEnet+ networking infrastructure
NASA Astrophysics Data System (ADS)
Ammendola, Roberto; Biagioni, Andrea; Cretaro, Paolo; Frezza, Ottorino; Lo Cicero, Francesca; Lonardo, Alessandro; Martinelli, Michele; Stanislao Paolucci, Pier; Pastorelli, Elena; Rossetti, Davide; Simula, Francesco; Vicini, Piero
2017-10-01
In this paper we present the status of the 3rd generation design of the APEnet board (V5) built upon the 28nm Altera Stratix V FPGA; it features a PCIe Gen3 x8 interface and enhanced embedded transceivers with a maximum capability of 12.5Gbps each. The network architecture is designed in accordance to the Remote DMA paradigm. The APEnet+ V5 prototype is built upon the Stratix V DevKit with the addition of a proprietary, third party IP core implementing multi-DMA engines. Support for zero-copy communication is assured by the possibility of DMA-accessing either host and GPU memory, offloading the CPU from the chore of data copying. The current implementation plateaus to a bandwidth for memory read of 4.8GB/s. Here we describe the hardware optimization to the memory write process which relies on the use of two independent DMA engines and an improved TLB.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1996-01-01
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1997-01-01
We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Bandwidth controller for phase-locked-loop
NASA Technical Reports Server (NTRS)
Brockman, Milton H. (Inventor)
1992-01-01
A phase locked loop utilizing digital techniques to control the closed loop bandwidth of the RF carrier phase locked loop in a receiver provides high sensitivity and a wide dynamic range for signal reception. After analog to digital conversion, a digital phase locked loop bandwidth controller provides phase error detection with automatic RF carrier closed loop tracking bandwidth control to accommodate several modes of transmission.
NASA Astrophysics Data System (ADS)
Wang, Fu; Liu, Bo; Zhang, Lijia; Jin, Feifei; Zhang, Qi; Tian, Qinghua; Tian, Feng; Rao, Lan; Xin, Xiangjun
2017-03-01
The wavelength-division multiplexing passive optical network (WDM-PON) is a potential technology to carry multiple services in an optical access network. However, it has the disadvantages of high cost and an immature technique for users. A software-defined WDM/time-division multiplexing PON was proposed to meet the requirements of high bandwidth, high performance, and multiple services. A reasonable and effective uplink dynamic bandwidth allocation algorithm was proposed. A controller with dynamic wavelength and slot assignment was introduced, and a different optical dynamic bandwidth management strategy was formulated flexibly for services of different priorities according to the network loading. The simulation compares the proposed algorithm with the interleaved polling with adaptive cycle time algorithm. The algorithm shows better performance in average delay, throughput, and bandwidth utilization. The results show that the delay is reduced to 62% and the throughput is improved by 35%.
Network bandwidth utilization forecast model on high bandwidth networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Wuchert; Sim, Alex
With the increasing number of geographically distributed scientific collaborations and the scale of the data size growth, it has become more challenging for users to achieve the best possible network performance on a shared network. We have developed a forecast model to predict expected bandwidth utilization for high-bandwidth wide area network. The forecast model can improve the efficiency of resource utilization and scheduling data movements on high-bandwidth network to accommodate ever increasing data volume for large-scale scientific data applications. Univariate model is developed with STL and ARIMA on SNMP path utilization data. Compared with traditional approach such as Box-Jenkins methodology,more » our forecast model reduces computation time by 83.2%. It also shows resilience against abrupt network usage change. The accuracy of the forecast model is within the standard deviation of the monitored measurements.« less
Network Bandwidth Utilization Forecast Model on High Bandwidth Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Wucherl; Sim, Alex
With the increasing number of geographically distributed scientific collaborations and the scale of the data size growth, it has become more challenging for users to achieve the best possible network performance on a shared network. We have developed a forecast model to predict expected bandwidth utilization for high-bandwidth wide area network. The forecast model can improve the efficiency of resource utilization and scheduling data movements on high-bandwidth network to accommodate ever increasing data volume for large-scale scientific data applications. Univariate model is developed with STL and ARIMA on SNMP path utilization data. Compared with traditional approach such as Box-Jenkins methodology,more » our forecast model reduces computation time by 83.2percent. It also shows resilience against abrupt network usage change. The accuracy of the forecast model is within the standard deviation of the monitored measurements.« less
A 10 micron heterodyne receiver for ultra high resolution astronomical spectroscopy
NASA Technical Reports Server (NTRS)
Buhl, D.; Chin, G.; Faris, J.; Kostiuk, T.; Mumma, M. J.; Zipoy, D.
1980-01-01
An improved CO2 laser heterodyne spectrometer is examined. The present system uses reflective optics to eliminate refocusing at different wavelengths, and the local oscillator is a line-center-stabilized isotopic CO2 laser. A tunable diffraction grating makes possible easy and rapid selection of over 50 transitions per isotope of CO2. The IF (0 to 1.6 GHz) from the HgCdTe photomizer is analyzed by a 128-channel filter bank, consisting of 64 tunable 5-MHz filters and 64 fixed 25-MHz RF filters. These filters provide resolving powers of about 1,000,000 to 10,000,000 and velocity resolution of 50 to 250 m/sec; their output is synchronously detected, integrated, multiplexed and stored in a buffer memory for the desired integration period. Kitt Peak observations show the wide spectral coverage, wide mixer and electronics bandwidth, and high sensitivity of the system.
Reducing I/O variability using dynamic I/O path characterization in petascale storage systems
Son, Seung Woo; Sehrish, Saba; Liao, Wei-keng; ...
2016-11-01
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. Furthermore, the I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We also implement the proposed mechanism inmore » the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. Here, we demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.« less
Radiation Hardened, Modulator ASIC for High Data Rate Communications
NASA Technical Reports Server (NTRS)
McCallister, Ron; Putnam, Robert; Andro, Monty; Fujikawa, Gene
2000-01-01
Satellite-based telecommunication services are challenged by the need to generate down-link power levels adequate to support high quality (BER approx. equals 10(exp 12)) links required for modem broadband data services. Bandwidth-efficient Nyquist signaling, using low values of excess bandwidth (alpha), can exhibit large peak-to-average-power ratio (PAPR) values. High PAPR values necessitate high-power amplifier (HPA) backoff greater than the PAPR, resulting in unacceptably low HPA efficiency. Given the high cost of on-board prime power, this inefficiency represents both an economical burden, and a constraint on the rates and quality of data services supportable from satellite platforms. Constant-envelope signals offer improved power-efficiency, but only by imposing a severe bandwidth-efficiency penalty. This paper describes a radiation- hardened modulator which can improve satellite-based broadband data services by combining the bandwidth-efficiency of low-alpha Nyquist signals with high power-efficiency (negligible HPA backoff).
Internet Protocol Handbook. Volume 4. The Domain Name System (DNS) handbook
1989-08-01
Mockapetris [Page 1] 4-11 INTERNET PROTOCOL HA TDBOOK - Voue Four 1989 RFC 1034 Domain Concepts and Facilities November 1987 bandwidth consumed in distributing...Domain Names- Concepts and Facilities KFC 1034 RFC 1034 Domain Concepts and Facilities November 1’)87 - Queries contain a bit called recursion desired...during periodic sweeps to reclaim the memory consumed by old RRS. Mockapetris [Page 33] 4-43 INTERNET PROTOCOL HANDBOOK - Volume Four 1989 RFC 1034
Scheins, J J; Vahedipour, K; Pietrzyk, U; Shah, N J
2015-12-21
For high-resolution, iterative 3D PET image reconstruction the efficient implementation of forward-backward projectors is essential to minimise the calculation time. Mathematically, the projectors are summarised as a system response matrix (SRM) whose elements define the contribution of image voxels to lines-of-response (LORs). In fact, the SRM easily comprises billions of non-zero matrix elements to evaluate the tremendous number of LORs as provided by state-of-the-art PET scanners. Hence, the performance of iterative algorithms, e.g. maximum-likelihood-expectation-maximisation (MLEM), suffers from severe computational problems due to the intensive memory access and huge number of floating point operations. Here, symmetries occupy a key role in terms of efficient implementation. They reduce the amount of independent SRM elements, thus allowing for a significant matrix compression according to the number of exploitable symmetries. With our previous work, the PET REconstruction Software TOolkit (PRESTO), very high compression factors (>300) are demonstrated by using specific non-Cartesian voxel patterns involving discrete polar symmetries. In this way, a pre-calculated memory-resident SRM using complex volume-of-intersection calculations can be achieved. However, our original ray-driven implementation suffers from addressing voxels, projection data and SRM elements in disfavoured memory access patterns. As a consequence, a rather limited numerical throughput is observed due to the massive waste of memory bandwidth and inefficient usage of cache respectively. In this work, an advantageous symmetry-driven evaluation of the forward-backward projectors is proposed to overcome these inefficiencies. The polar symmetries applied in PRESTO suggest a novel organisation of image data and LOR projection data in memory to enable an efficient single instruction multiple data vectorisation, i.e. simultaneous use of any SRM element for symmetric LORs. In addition, the calculation time is further reduced by using simultaneous multi-threading (SMT). A global speedup factor of 11 without SMT and above 100 with SMT has been achieved for the improved CPU-based implementation while obtaining equivalent numerical results.
NASA Astrophysics Data System (ADS)
Scheins, J. J.; Vahedipour, K.; Pietrzyk, U.; Shah, N. J.
2015-12-01
For high-resolution, iterative 3D PET image reconstruction the efficient implementation of forward-backward projectors is essential to minimise the calculation time. Mathematically, the projectors are summarised as a system response matrix (SRM) whose elements define the contribution of image voxels to lines-of-response (LORs). In fact, the SRM easily comprises billions of non-zero matrix elements to evaluate the tremendous number of LORs as provided by state-of-the-art PET scanners. Hence, the performance of iterative algorithms, e.g. maximum-likelihood-expectation-maximisation (MLEM), suffers from severe computational problems due to the intensive memory access and huge number of floating point operations. Here, symmetries occupy a key role in terms of efficient implementation. They reduce the amount of independent SRM elements, thus allowing for a significant matrix compression according to the number of exploitable symmetries. With our previous work, the PET REconstruction Software TOolkit (PRESTO), very high compression factors (>300) are demonstrated by using specific non-Cartesian voxel patterns involving discrete polar symmetries. In this way, a pre-calculated memory-resident SRM using complex volume-of-intersection calculations can be achieved. However, our original ray-driven implementation suffers from addressing voxels, projection data and SRM elements in disfavoured memory access patterns. As a consequence, a rather limited numerical throughput is observed due to the massive waste of memory bandwidth and inefficient usage of cache respectively. In this work, an advantageous symmetry-driven evaluation of the forward-backward projectors is proposed to overcome these inefficiencies. The polar symmetries applied in PRESTO suggest a novel organisation of image data and LOR projection data in memory to enable an efficient single instruction multiple data vectorisation, i.e. simultaneous use of any SRM element for symmetric LORs. In addition, the calculation time is further reduced by using simultaneous multi-threading (SMT). A global speedup factor of 11 without SMT and above 100 with SMT has been achieved for the improved CPU-based implementation while obtaining equivalent numerical results.
Two-dimensional real-time imaging system for subtraction angiography using an iodine filter
NASA Astrophysics Data System (ADS)
Umetani, Keiji; Ueda, Ken; Takeda, Tohoru; Anno, Izumi; Itai, Yuji; Akisada, Masayoshi; Nakajima, Teiichi
1992-01-01
A new type of subtraction imaging system was developed using an iodine filter and a single-energy broad bandwidth monochromatized x ray. The x-ray images of coronary arteries made after intravenous injection of a contrast agent are enhanced by an energy-subtraction technique. Filter chopping of the x-ray beam switches energies rapidly, so that a nearly simultaneous pair of filtered and nonfiltered images can be made. By using a high-speed video camera, a pair of two 512 × 512 pixel images can be obtained within 9 ms. Three hundred eighty-four images (raw data) are stored in a 144-Mbyte frame memory. After phantom studies, in vivo subtracted images of coronary arteries in dogs were obtained at a rate of 15 images/s.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Sen; Luo, Sheng-Nian
Polychromatic X-ray sources can be useful for photon-starved small-angle X-ray scattering given their high spectral fluxes. Their bandwidths, however, are 10–100 times larger than those using monochromators. To explore the feasibility, ideal scattering curves of homogeneous spherical particles for polychromatic X-rays are calculated and analyzed using the Guinier approach, maximum entropy and regularization methods. Monodisperse and polydisperse systems are explored. The influence of bandwidth and asymmetric spectra shape are exploredviaGaussian and half-Gaussian spectra. Synchrotron undulator spectra represented by two undulator sources of the Advanced Photon Source are examined as an example, as regards the influence of asymmetric harmonic shape, fundamentalmore » harmonic bandwidth and high harmonics. The effects of bandwidth, spectral shape and high harmonics on particle size determination are evaluated quantitatively.« less
Application of LC and LCoS in Multispectral Polarized Scene Projector (MPSP)
NASA Astrophysics Data System (ADS)
Yu, Haiping; Guo, Lei; Wang, Shenggang; Lippert, Jack; Li, Le
2017-02-01
A Multispectral Polarized Scene Projector (MPSP) had been developed in the short-wave infrared (SWIR) regime for the test & evaluation (T&E) of spectro-polarimetric imaging sensors. This MPSP generates multispectral and hyperspectral video images (up to 200 Hz) with 512×512 spatial resolution with active spatial, spectral, and polarization modulation with controlled bandwidth. It projects input SWIR radiant intensity scenes from stored memory with user selectable wavelength and bandwidth, as well as polarization states (six different states) controllable on a pixel level. The spectral contents are implemented by a tunable filter with variable bandpass built based on liquid crystal (LC) material, together with one passive visible and one passive SWIR cholesteric liquid crystal (CLC) notch filters, and one switchable CLC notch filter. The core of the MPSP hardware is the liquid-crystal-on-silicon (LCoS) spatial light modulators (SLMs) for intensity control and polarization modulation.
Pseudo-differential CMOS analog front-end circuit for wide-bandwidth optical probe current sensor
NASA Astrophysics Data System (ADS)
Uekura, Takaharu; Oyanagi, Kousuke; Sonehara, Makoto; Sato, Toshiro; Miyaji, Kousuke
2018-04-01
In this paper, we present a pseudo-differential analog front-end (AFE) circuit for a novel optical probe current sensor (OPCS) aimed for high-frequency power electronics. It employs a regulated cascode transimpedance amplifier (RGC-TIA) to achieve a high gain and a large bandwidth without using an extremely high performance operational amplifier. The AFE circuit is designed in a 0.18 µm standard CMOS technology achieving a high transimpedance gain of 120 dB Ω and high cut off frequency of 16 MHz. The measured slew rate is 70 V/µs and the input referred current noise is 1.02 pA/\\sqrt{\\text{Hz}} . The magnetic resolution and bandwidth of OPCS are estimated to be 1.29 mTrms and 16 MHz, respectively; the bandwidth is higher than that of the reported Hall effect current sensor.
ASA-FTL: An adaptive separation aware flash translation layer for solid state drives
Xie, Wei; Chen, Yong; Roth, Philip C
2016-11-03
Here, the flash-memory based Solid State Drive (SSD) presents a promising storage solution for increasingly critical data-intensive applications due to its low latency (high throughput), high bandwidth, and low power consumption. Within an SSD, its Flash Translation Layer (FTL) is responsible for exposing the SSD’s flash memory storage to the computer system as a simple block device. The FTL design is one of the dominant factors determining an SSD’s lifespan and performance. To reduce the garbage collection overhead and deliver better performance, we propose a new, low-cost, adaptive separation-aware flash translation layer (ASA-FTL) that combines sampling, data clustering and selectivemore » caching of recency information to accurately identify and separate hot/cold data while incurring minimal overhead. We use sampling for light-weight identification of separation criteria, and our dedicated selective caching mechanism is designed to save the limited RAM resource in contemporary SSDs. Using simulations of ASA-FTL with both real-world and synthetic workloads, we have shown that our proposed approach reduces the garbage collection overhead by up to 28% and the overall response time by 15% compared to one of the most advanced existing FTLs. We find that the data clustering using a small sample size provides significant performance benefit while only incurring a very small computation and memory cost. In addition, our evaluation shows that ASA-FTL is able to adapt to the changes in the access pattern of workloads, which is a major advantage comparing to existing fixed data separation methods.« less
Ultra-high-speed optical transmission using digital-preprocessed analog-multiplexed DAC
NASA Astrophysics Data System (ADS)
Yamazaki, Hiroshi; Nagatani, Munehiko; Hamaoka, Fukutaro; Horikoshi, Kengo; Nakamura, Masanori; Matsushita, Asuka; Kanazawa, Shigeru; Hashimoto, Toshikazu; Nosaka, Hideyuki; Miyamoto, Yutaka
2018-02-01
In advanced fiber transmission systems with digital signal processors (DSPs), analog bandwidths of digital-to-analog converters (DACs), which interface the DSPs and optics, are the major factors limiting the data rates. We have developed a technology to extend the DACs' bandwidth using a digital preprocessor, two sub-DACs, and an analog multiplexer. This technology enables us to generate baseband signals with bandwidths of up to around 60 GHz, which is almost twice that of signals generated by typical CMOS DACs. In this paper, we describe the principle of the bandwidth extension and review high-speed transmission experiments enabled by this technology.
Measuring Memory and Attention to Preview in Motion.
Jagacinski, Richard J; Hammond, Gordon M; Rizzi, Emanuele
2017-08-01
Objective Use perceptual-motor responses to perturbations to reveal the spatio-temporal detail of memory for the recent past and attention to preview when participants track a winding roadway. Background Memory of the recently passed roadway can be inferred from feedback control models of the participants' manual movement patterns. Similarly, attention to preview of the upcoming roadway can be inferred from feedforward control models of manual movement patterns. Method Perturbation techniques were used to measure these memory and attention functions. Results In a laboratory tracking task, the bandwidth of lateral roadway deviations was found to primarily influence memory for the past roadway rather than attention to preview. A secondary auditory/verbal/vocal memory task resulted in higher velocity error and acceleration error in the tracking task but did not affect attention to preview. Attention to preview was affected by the frequency pattern of sinusoidal perturbations of the roadway. Conclusion Perturbation techniques permit measurement of the spatio-temporal span of memory and attention to preview that affect tracking a winding roadway. They also provide new ways to explore goal-directed forgetting and spatially distributed attention in the context of movement. More generally, these techniques provide sensitive measures of individual differences in cognitive aspects of action. Application Models of driving behavior and assessment of driving skill may benefit from more detailed spatio-temporal measurement of attention to preview.
NASA Astrophysics Data System (ADS)
Haron, Adib; Mahdzair, Fazren; Luqman, Anas; Osman, Nazmie; Junid, Syed Abdul Mutalib Al
2018-03-01
One of the most significant constraints of Von Neumann architecture is the limited bandwidth between memory and processor. The cost to move data back and forth between memory and processor is considerably higher than the computation in the processor itself. This architecture significantly impacts the Big Data and data-intensive application such as DNA analysis comparison which spend most of the processing time to move data. Recently, the in-memory processing concept was proposed, which is based on the capability to perform the logic operation on the physical memory structure using a crossbar topology and non-volatile resistive-switching memristor technology. This paper proposes a scheme to map digital equality comparator circuit on memristive memory crossbar array. The 2-bit, 4-bit, 8-bit, 16-bit, 32-bit, and 64-bit of equality comparator circuit are mapped on memristive memory crossbar array by using material implication logic in a sequential and parallel method. The simulation results show that, for the 64-bit word size, the parallel mapping exhibits 2.8× better performance in total execution time than sequential mapping but has a trade-off in terms of energy consumption and area utilization. Meanwhile, the total crossbar area can be reduced by 1.2× for sequential mapping and 1.5× for parallel mapping both by using the overlapping technique.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McGoldrick, P.R.
1981-01-01
The Mirror Fusion Test Facility (MFTF) is a complex facility requiring a highly-computerized Supervisory Control and Diagnostics System (SCDS) to monitor and provide control over ten subsystems; three of which require true process control. SCDS will provide physicists with a method of studying machine and plasma behavior by acquiring and processing up to four megabytes of plasma diagnostic information every five minutes. A high degree of availability and throughput is provided by a distributed computer system (nine 32-bit minicomputers on shared memory). Data, distributed across SCDS, is managed by a high-bandwidth Distributed Database Management System. The MFTF operators' control roommore » consoles use color television monitors with touch sensitive screens; this is a totally new approach. The method of handling deviations to normal machine operation and how the operator should be notified and assisted in the resolution of problems has been studied and a system designed.« less
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Information Processing Techniques Program. Volume 1. Packet Speech Systems Technology
1980-03-31
DMA transfer is enabled from the 2652 serial I/O device to the buffer memory. This enables automatic recep- tion of an incoming packet without (’PU...conference speaker. Producing multiple copies at the source wastes network bandwidth and is likely to cause local overload conditions for a large... wasted . If the setup fails because ST can fird no route with sufficient capacity, the phone will have rung and possibly been answered 18 but the call will
Integrated Short Range, Low Bandwidth, Wearable Communications Networking Technologies
2012-04-30
Only (FOUO) Table of Contents Introduction 7 Research Discussions 7 1 Specifications 8 2 SAN Radio 9 2.1 R.F. Design Improvements 9 2.1.1 LNA...Characterization and Verification Testing 26 2.2 Digital Design Improvements 26 2.2.1 Improve Processor Access to Memory Resources 26 2.2.2...integrated and tested . A hybrid architecture of the automatic gain control (AGC) was designed to Page 7 of 116 For Official Use Only (FOUO
Limited Bandwidth Recognition of Collective Behaviors in Bio-Inspired Swarms
2014-05-01
Nevai, K. M. Passino, and P. Srinivasan. Stability of choice in the honey bee nest-site selection processs. Journal of Theoretical Biology , 263(1):93...and N. Franks. Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology , 218(1):1–11, 2002. [4] D. Cvetkovic, P...motion from local attraction. Journal of Theoretical Biology , 283(1):145–151, 2011. [18] G. Sukthankar and K. Sycara. Robust recognition of physical team
Feng, Zhao; Ling, Jie; Ming, Min; Xiao, Xiao-Hui
2017-08-01
For precision motion, high-bandwidth and flexible tracking are the two important issues for significant performance improvement. Iterative learning control (ILC) is an effective feedforward control method only for systems that operate strictly repetitively. Although projection ILC can track varying references, the performance is still limited by the fixed-bandwidth Q-filter, especially for triangular waves tracking commonly used in a piezo nanopositioner. In this paper, a wavelet transform-based linear time-varying (LTV) Q-filter design for projection ILC is proposed to compensate high-frequency errors and improve the ability to tracking varying references simultaneously. The LVT Q-filter is designed based on the modulus maximum of wavelet detail coefficients calculated by wavelet transform to determine the high-frequency locations of each iteration with the advantages of avoiding cross-terms and segmenting manually. The proposed approach was verified on a piezo nanopositioner. Experimental results indicate that the proposed approach can locate the high-frequency regions accurately and achieve the best performance under varying references compared with traditional frequency-domain and projection ILC with a fixed-bandwidth Q-filter, which validates that through implementing the LTV filter on projection ILC, high-bandwidth and flexible tracking can be achieved simultaneously by the proposed approach.
Linearity optimizations of analog ring resonator modulators through bias voltage adjustments
NASA Astrophysics Data System (ADS)
Hosseinzadeh, Arash; Middlebrook, Christopher T.
2018-03-01
The linearity of ring resonator modulator (RRM) in microwave photonic links is studied in terms of instantaneous bandwidth, fabrication tolerances, and operational bandwidth. A proposed bias voltage adjustment method is shown to maximize spur-free dynamic range (SFDR) at instantaneous bandwidths required by microwave photonic link (MPL) applications while also mitigating RRM fabrication tolerances effects. The proposed bias voltage adjustment method shows RRM SFDR improvement of ∼5.8 dB versus common Mach-Zehnder modulators at 500 MHz instantaneous bandwidth. Analyzing operational bandwidth effects on SFDR shows RRMs can be promising electro-optic modulators for MPL applications which require high operational frequencies while in a limited bandwidth such as radio-over-fiber 60 GHz wireless network access.
GPU-based Parallel Application Design for Emerging Mobile Devices
NASA Astrophysics Data System (ADS)
Gupta, Kshitij
A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as compute and communication capabilities of mobile devices improve, we analyze energy implications of processing speech recognition locally (on-chip) and offloading it to servers (in-cloud).
Exploiting graphics processing units for computational biology and bioinformatics.
Payne, Joshua L; Sinnott-Armstrong, Nicholas A; Moore, Jason H
2010-09-01
Advances in the video gaming industry have led to the production of low-cost, high-performance graphics processing units (GPUs) that possess more memory bandwidth and computational capability than central processing units (CPUs), the standard workhorses of scientific computing. With the recent release of generalpurpose GPUs and NVIDIA's GPU programming language, CUDA, graphics engines are being adopted widely in scientific computing applications, particularly in the fields of computational biology and bioinformatics. The goal of this article is to concisely present an introduction to GPU hardware and programming, aimed at the computational biologist or bioinformaticist. To this end, we discuss the primary differences between GPU and CPU architecture, introduce the basics of the CUDA programming language, and discuss important CUDA programming practices, such as the proper use of coalesced reads, data types, and memory hierarchies. We highlight each of these topics in the context of computing the all-pairs distance between instances in a dataset, a common procedure in numerous disciplines of scientific computing. We conclude with a runtime analysis of the GPU and CPU implementations of the all-pairs distance calculation. We show our final GPU implementation to outperform the CPU implementation by a factor of 1700.
Remote quantum entanglement between two micromechanical oscillators.
Riedinger, Ralf; Wallucks, Andreas; Marinković, Igor; Löschnauer, Clemens; Aspelmeyer, Markus; Hong, Sungkun; Gröblacher, Simon
2018-04-01
Entanglement, an essential feature of quantum theory that allows for inseparable quantum correlations to be shared between distant parties, is a crucial resource for quantum networks 1 . Of particular importance is the ability to distribute entanglement between remote objects that can also serve as quantum memories. This has been previously realized using systems such as warm 2,3 and cold atomic vapours 4,5 , individual atoms 6 and ions 7,8 , and defects in solid-state systems 9-11 . Practical communication applications require a combination of several advantageous features, such as a particular operating wavelength, high bandwidth and long memory lifetimes. Here we introduce a purely micromachined solid-state platform in the form of chip-based optomechanical resonators made of nanostructured silicon beams. We create and demonstrate entanglement between two micromechanical oscillators across two chips that are separated by 20 centimetres . The entangled quantum state is distributed by an optical field at a designed wavelength near 1,550 nanometres. Therefore, our system can be directly incorporated in a realistic fibre-optic quantum network operating in the conventional optical telecommunication band. Our results are an important step towards the development of large-area quantum networks based on silicon photonics.
An Accurately Controlled Antagonistic Shape Memory Alloy Actuator with Self-Sensing
Wang, Tian-Miao; Shi, Zhen-Yun; Liu, Da; Ma, Chen; Zhang, Zhen-Hua
2012-01-01
With the progress of miniaturization, shape memory alloy (SMA) actuators exhibit high energy density, self-sensing ability and ease of fabrication, which make them well suited for practical applications. This paper presents a self-sensing controlled actuator drive that was designed using antagonistic pairs of SMA wires. Under a certain pre-strain and duty cycle, the stress between two wires becomes constant. Meanwhile, the strain to resistance curve can minimize the hysteresis gap between the heating and the cooling paths. The curves of both wires are then modeled by fitting polynomials such that the measured resistance can be used directly to determine the difference between the testing values and the target strain. The hysteresis model of strains to duty cycle difference has been used as compensation. Accurate control is demonstrated through step response and sinusoidal tracking. The experimental results show that, under a combination control program, the root-mean-square error can be reduced to 1.093%. The limited bandwidth of the frequency is estimated to be 0.15 Hz. Two sets of instruments with three degrees of freedom are illustrated to show how this type actuator could be potentially implemented. PMID:22969368
Parallel design of JPEG-LS encoder on graphics processing units
NASA Astrophysics Data System (ADS)
Duan, Hao; Fang, Yong; Huang, Bormin
2012-01-01
With recent technical advances in graphic processing units (GPUs), GPUs have outperformed CPUs in terms of compute capability and memory bandwidth. Many successful GPU applications to high performance computing have been reported. JPEG-LS is an ISO/IEC standard for lossless image compression which utilizes adaptive context modeling and run-length coding to improve compression ratio. However, adaptive context modeling causes data dependency among adjacent pixels and the run-length coding has to be performed in a sequential way. Hence, using JPEG-LS to compress large-volume hyperspectral image data is quite time-consuming. We implement an efficient parallel JPEG-LS encoder for lossless hyperspectral compression on a NVIDIA GPU using the computer unified device architecture (CUDA) programming technology. We use the block parallel strategy, as well as such CUDA techniques as coalesced global memory access, parallel prefix sum, and asynchronous data transfer. We also show the relation between GPU speedup and AVIRIS block size, as well as the relation between compression ratio and AVIRIS block size. When AVIRIS images are divided into blocks, each with 64×64 pixels, we gain the best GPU performance with 26.3x speedup over its original CPU code.
Intelligent bandwidth compression
NASA Astrophysics Data System (ADS)
Tseng, D. Y.; Bullock, B. L.; Olin, K. E.; Kandt, R. K.; Olsen, J. D.
1980-02-01
The feasibility of a 1000:1 bandwidth compression ratio for image transmission has been demonstrated using image-analysis algorithms and a rule-based controller. Such a high compression ratio was achieved by first analyzing scene content using auto-cueing and feature-extraction algorithms, and then transmitting only the pertinent information consistent with mission requirements. A rule-based controller directs the flow of analysis and performs priority allocations on the extracted scene content. The reconstructed bandwidth-compressed image consists of an edge map of the scene background, with primary and secondary target windows embedded in the edge map. The bandwidth-compressed images are updated at a basic rate of 1 frame per second, with the high-priority target window updated at 7.5 frames per second. The scene-analysis algorithms used in this system together with the adaptive priority controller are described. Results of simulated 1000:1 bandwidth-compressed images are presented.
High bandwidth deflection readout for atomic force microscopes.
Steininger, Juergen; Bibl, Matthias; Yoo, Han Woong; Schitter, Georg
2015-10-01
This contribution presents the systematic design of a high bandwidth deflection readout mechanism for atomic force microscopes. The widely used optical beam deflection method is revised by adding a focusing lens between the cantilever and the quadrant photodetector (QPD). This allows the utilization of QPDs with a small active area resulting in an increased detection bandwidth due to the reduced junction capacitance. Furthermore the additional lens can compensate a cross talk between a compensating z-movement of the cantilever and the deflection readout. Scaling effects are analyzed to get the optimal spot size for the given geometry of the QPD. The laser power is tuned to maximize the signal to noise ratio without limiting the bandwidth by local saturation effects. The systematic approach results in a measured -3 dB detection bandwidth of 64.5 MHz at a deflection noise density of 62fm/√Hz.
High bandwidth deflection readout for atomic force microscopes
NASA Astrophysics Data System (ADS)
Steininger, Juergen; Bibl, Matthias; Yoo, Han Woong; Schitter, Georg
2015-10-01
This contribution presents the systematic design of a high bandwidth deflection readout mechanism for atomic force microscopes. The widely used optical beam deflection method is revised by adding a focusing lens between the cantilever and the quadrant photodetector (QPD). This allows the utilization of QPDs with a small active area resulting in an increased detection bandwidth due to the reduced junction capacitance. Furthermore the additional lens can compensate a cross talk between a compensating z-movement of the cantilever and the deflection readout. Scaling effects are analyzed to get the optimal spot size for the given geometry of the QPD. The laser power is tuned to maximize the signal to noise ratio without limiting the bandwidth by local saturation effects. The systematic approach results in a measured -3 dB detection bandwidth of 64.5 MHz at a deflection noise density of 62 fm / √{ Hz } .
High bandwidth vapor density diagnostic system
Globig, Michael A.; Story, Thomas W.
1992-01-01
A high bandwidth vapor density diagnostic system for measuring the density of an atomic vapor during one or more photoionization events. The system translates the measurements from a low frequency region to a high frequency, relatively noise-free region in the spectrum to provide improved signal to noise ratio.
Development and Operation of a Material Identification and Discrimination Imaging Spectroradiometer
NASA Technical Reports Server (NTRS)
Dombrowski, Mark; Willson, paul; LaBaw, Clayton
1997-01-01
Many imaging applications require quantitative determination of a scene's spectral radiance. This paper describes a new system capable of real-time spectroradiometric imagery. Operating at a full-spectrum update rate of 30Hz, this imager is capable of collecting a 30 point spectrum from each of three imaging heads: the first operates from 400 nm to 950 nm, with a 2% bandwidth; the second operates from 1.5 micro-m to 5.5 micro-m with a 1.5% bandwidth; the third operates from 5 micro-m to 12 micro-m, also at a 1.5% bandwidth. Standard image format is 256 x 256, with 512 x 512 possible in the VIS/NIR head. Spectra of up to 256 points are available at proportionately lower frame rates. In order to make such a tremendous amount of data more manageable, internal processing electronics perform four important operations on the spectral imagery data in real-time. First, all data in the spatial/spectral cube of data is spectro-radiometrically calibrated as it is collected. Second, to allow the imager to simulate sensors with arbitrary spectral response, any set of three spectral response functions may be loaded into the imager including delta functions to allow single wavelength viewing; the instrument then evaluates the integral of the product of the scene spectral radiances and the response function. Third, more powerful exploitation of the gathered spectral radiances can be effected by application of various spectral-matched filtering algorithms to identify pixels whose relative spectral radiance distribution matches a sought-after spectral radiance distribution, allowing materials-based identification and discrimination. Fourth, the instrument allows determination of spectral reflectance, surface temperature, and spectral emissivity, also in real-time. The spectral imaging technique used in the instrument allows tailoring of the frame rate and/or the spectral bandwidth to suit the scene radiance levels, i.e., frame rate can be reduced, or bandwidth increased to improve SNR when viewing low radiance scenes. The unique challenges of design and calibration are described. Pixel readout rates of 160 MHz, for full frame readout rates of 1000 Hz (512 x 512 image) present the first challenge; processing rates of nearly 600 million integer operations per second for sensor emulation, or over 2 billion per second for matched filtering, present the second. Spatial and spectral calibration of 66,536 pixels (262,144 for the 512 x 512 version) and up to 1,000 spectral positions mandate novel decoupling methods to keep the required calibration memory to a reasonable size. Large radiometric dynamic range also requires care to maintain precision operation with minimum memory size.
In-camera video-stream processing for bandwidth reduction in web inspection
NASA Astrophysics Data System (ADS)
Jullien, Graham A.; Li, QiuPing; Hajimowlana, S. Hossain; Morvay, J.; Conflitti, D.; Roberts, James W.; Doody, Brian C.
1996-02-01
Automated machine vision systems are now widely used for industrial inspection tasks where video-stream data information is taken in by the camera and then sent out to the inspection system for future processing. In this paper we describe a prototype system for on-line programming of arbitrary real-time video data stream bandwidth reduction algorithms; the output of the camera only contains information that has to be further processed by a host computer. The processing system is built into a DALSA CCD camera and uses a microcontroller interface to download bit-stream data to a XILINXTM FPGA. The FPGA is directly connected to the video data-stream and outputs data to a low bandwidth output bus. The camera communicates to a host computer via an RS-232 link to the microcontroller. Static memory is used to both generate a FIFO interface for buffering defect burst data, and for off-line examination of defect detection data. In addition to providing arbitrary FPGA architectures, the internal program of the microcontroller can also be changed via the host computer and a ROM monitor. This paper describes a prototype system board, mounted inside a DALSA camera, and discusses some of the algorithms currently being implemented for web inspection applications.
NASA Astrophysics Data System (ADS)
Yahampath, Pradeepa
2017-12-01
Consider communicating a correlated Gaussian source over a Rayleigh fading channel with no knowledge of the channel signal-to-noise ratio (CSNR) at the transmitter. In this case, a digital system cannot be optimal for a range of CSNRs. Analog transmission however is optimal at all CSNRs, if the source and channel are memoryless and bandwidth matched. This paper presents new hybrid digital-analog (HDA) systems for sources with memory and channels with bandwidth expansion, which outperform both digital-only and analog-only systems over a wide range of CSNRs. The digital part is either a predictive quantizer or a transform code, used to achieve a coding gain. Analog part uses linear encoding to transmit the quantization error which improves the performance under CSNR variations. The hybrid encoder is optimized to achieve the minimum AMMSE (average minimum mean square error) over the CSNR distribution. To this end, analytical expressions are derived for the AMMSE of asymptotically optimal systems. It is shown that the outage CSNR of the channel code and the analog-digital power allocation must be jointly optimized to achieve the minimum AMMSE. In the case of HDA predictive quantization, a simple algorithm is presented to solve the optimization problem. Experimental results are presented for both Gauss-Markov sources and speech signals.
Optical interconnects for satellite payloads: overview of the state-of-the-art
NASA Astrophysics Data System (ADS)
Vervaeke, Michael; Debaes, Christof; Van Erps, Jürgen; Karppinen, Mikko; Tanskanen, Antti; Aalto, Timo; Harjanne, Mikko; Thienpont, Hugo
2010-05-01
The increased demand of broadband communication services like High Definition Television, Video On Demand, Triple Play, fuels the technologies to enhance the bandwidth of individual users towards service providers and hence the increase of aggregate bandwidths on terrestial networks. Optical solutions clearly leverage the bandwidth appetite easily whereas electrical interconnection schemes require an ever-increasing effort to counteract signal distortions at higher bitrates. Dense wavelength division multiplexing and all-optical signal regeneration and switching solve the bandwidth demands of network trunks. Fiber-to-the-home, and fiber-to-the-desk are trends towards providing individual users with greatly increased bandwidth. Operators in the satellite telecommunication sector face similar challenges fuelled by the same demands as for their terrestial counterparts. Moreover, the limited number of orbital positions for new satellites set the trend for an increase in payload datacommunication capacity using an ever-increasing number of complex multi-beam active antennas and a larger aggregate bandwidth. Only satellites with very large capacity, high computational density and flexible, transparent fully digital payload solutions achieve affordable communication prices. To keep pace with the bandwidth and flexibility requirements, designers have to come up with systems requiring a total digital througput of a few Tb/s resulting in a high power consuming satellite payload. An estimated 90 % of the total power consumption per chip is used for the off-chip communication lines. We have undertaken a study to assess the viability of optical datacommunication solutions to alleviate the demands regarding power consumption and aggregate bandwidth imposed on future satellite communication payloads. The review on optical interconnects given here is especially focussed on the demands of the satellite communication business and the particular environment in which the optics have to perform their functionality: space.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Field, Ella Suzanne; Bellum, John Curtis; Kletecka, Damon E.
Broad bandwidth coatings allow angle of incidence flexibility and accommodate spectral shifts due to aging and water absorption. Higher refractive index materials in optical coatings, such as TiO 2, Nb 2O 5, and Ta 2O 5, can be used to achieve broader bandwidths compared to coatings that contain HfO 2 high index layers. We have identified the deposition settings that lead to the highest index, lowest absorption layers of TiO 2, Nb 2O 5, and Ta 2O 5, via e-beam evaporation using ion-assisted deposition. We paired these high index materials with SiO 2 as the low index material to createmore » broad bandwidth high reflection coatings centered at 1054 nm for 45 deg angle of incidence and P polarization. Furthermore, high reflection bandwidths as large as 231 nm were realized. Laser damage tests of these coatings using the ISO 11254 and NIF-MEL protocols are presented, which revealed that the Ta 2O 5/SiO 2 coating exhibits the highest resistance to laser damage, at the expense of lower bandwidth compared to the TiO 2/SiO 2 and Nb 2O 5/SiO 2 coatings.« less
NASA Astrophysics Data System (ADS)
Ko, Wai Son; Bhattacharya, Indrasen; Tran, Thai-Truong D.; Ng, Kar Wei; Adair Gerke, Stephen; Chang-Hasnain, Connie
2016-09-01
Highly sensitive and fast photodetectors can enable low power, high bandwidth on-chip optical interconnects for silicon integrated electronics. III-V compound semiconductor direct-bandgap materials with high absorption coefficients are particularly promising for photodetection in energy-efficient optical links because of the potential to scale down the absorber size, and the resulting capacitance and dark current, while maintaining high quantum efficiency. We demonstrate a compact bipolar junction phototransistor with a high current gain (53.6), bandwidth (7 GHz) and responsivity (9.5 A/W) using a single crystalline indium phosphide nanopillar directly grown on a silicon substrate. Transistor gain is obtained at sub-picowatt optical power and collector bias close to the CMOS line voltage. The quantum efficiency-bandwidth product of 105 GHz is the highest for photodetectors on silicon. The bipolar junction phototransistor combines the receiver front end circuit and absorber into a monolithic integrated device, eliminating the wire capacitance between the detector and first amplifier stage.
Ko, Wai Son; Bhattacharya, Indrasen; Tran, Thai-Truong D.; Ng, Kar Wei; Adair Gerke, Stephen; Chang-Hasnain, Connie
2016-01-01
Highly sensitive and fast photodetectors can enable low power, high bandwidth on-chip optical interconnects for silicon integrated electronics. III-V compound semiconductor direct-bandgap materials with high absorption coefficients are particularly promising for photodetection in energy-efficient optical links because of the potential to scale down the absorber size, and the resulting capacitance and dark current, while maintaining high quantum efficiency. We demonstrate a compact bipolar junction phototransistor with a high current gain (53.6), bandwidth (7 GHz) and responsivity (9.5 A/W) using a single crystalline indium phosphide nanopillar directly grown on a silicon substrate. Transistor gain is obtained at sub-picowatt optical power and collector bias close to the CMOS line voltage. The quantum efficiency-bandwidth product of 105 GHz is the highest for photodetectors on silicon. The bipolar junction phototransistor combines the receiver front end circuit and absorber into a monolithic integrated device, eliminating the wire capacitance between the detector and first amplifier stage. PMID:27659796
NASA Astrophysics Data System (ADS)
Lin, Chun-Han; Tu, Charng-Gan; Yao, Yu-Feng; Chen, Sheng-Hung; Su, Chia-Ying; Chen, Hao-Tsung; Kiang, Yean-Woei; Yang, Chih-Chung
2017-02-01
Besides lighting, LEDs can be used for indoor data transmission. Therefore, a large modulation bandwidth becomes an important target in the development of visible LED. In this regard, enhancing the radiative recombination rate of carriers in the quantum wells of an LED is a useful method since the modulation bandwidth of an LED is related to the carrier decay rate besides the device RC time constant To increase the carrier decay rate in an LED without sacrificing its output power, the technique of surface plasmon (SP) coupling in an LED is useful. In this paper, the increases of modulation bandwidth by reducing mesa size, decreasing active layer thickness, and inducing SP coupling in blue- and green-emitting LEDs are illustrated. The results are demonstrated by comparing three different LED surface structures, including bare p-type surface, GaZnO current spreading layer, and Ag nanoparticles (NPs) for inducing SP coupling. In a single-quantum-well, blue-emitting LED with a circular mesa of 10 microns in radius, SP coupling results in a modulation bandwidth of 528.8 MHz, which is believed to be the record-high level. A smaller RC time constant can lead to a higher modulation bandwidth. However, when the RC time constant is smaller than 0.2 ns, its effect on modulation bandwidth saturates. The dependencies of modulation bandwidth on injected current density and carrier decay time confirm that the modulation bandwidth is essentially inversely proportional to a time constant, which is inversely proportional to the square-root of carrier decay rate and injected current density.
Compact antenna arrays with wide bandwidth and low sidelobe levels
Strassner, II, Bernd H.
2014-09-09
Highly efficient, low cost, easily manufactured SAR antenna arrays with lightweight low profiles, large instantaneous bandwidths and low SLL are disclosed. The array topology provides all necessary circuitry within the available antenna aperture space and between the layers of material that comprise the aperture. Bandwidths of 15.2 GHz to 18.2 GHz, with 30 dB SLLs azimuthally and elevationally, and radiation efficiencies above 40% may be achieved. Operation over much larger bandwidths is possible as well.
A potassium Faraday anomalous dispersion optical filter
NASA Technical Reports Server (NTRS)
Yin, B.; Shay, T. M.
1992-01-01
The characteristics of a potassium Faraday anomalous dispersion optical filter operating on the blue and near infrared transitions are calculated. The results show that the filter can be designed to provide high transmission, very narrow pass bandwidth, and low equivalent noise bandwidth. The Faraday anomalous dispersion optical filter (FADOF) provides a narrow pass bandwidth (about GHz) optical filter for laser communications, remote sensing, and lidar. The general theoretical model for the FADOF has been established in our previous paper. In this paper, we have identified the optimum operational conditions for a potassium FADOF operating on the blue and infrared transitions. The signal transmission, bandwidth, and equivalent noise bandwidth (ENBW) are also calculated.
NASA Astrophysics Data System (ADS)
Zheng, Jun; Ansari, Nirwan
2005-02-01
Call for Papers: Optical Access Networks With the wide deployment of fiber-optic technology over the past two decades, we have witnessed a tremendous growth of bandwidth capacity in the backbone networks of today's telecommunications infrastructure. However, access networks, which cover the "last-mile" areas and serve numerous residential and small business users, have not been scaled up commensurately. The local subscriber lines for telephone and cable television are still using twisted pairs and coaxial cables. Most residential connections to the Internet are still through dial-up modems operating at a low speed on twisted pairs. As the demand for access bandwidth increases with emerging high-bandwidth applications, such as distance learning, high-definition television (HDTV), and video on demand (VoD), the last-mile access networks have become a bandwidth bottleneck in today's telecommunications infrastructure. To ease this bottleneck, it is imperative to provide sufficient bandwidth capacity in the access networks to open the bottleneck and thus present more opportunities for the provisioning of multiservices. Optical access solutions promise huge bandwidth to service providers and low-cost high-bandwidth services to end users and are therefore widely considered the technology of choice for next-generation access networks. To realize the vision of optical access networks, however, many key issues still need to be addressed, such as network architectures, signaling protocols, and implementation standards. The major challenges lie in the fact that an optical solution must be not only robust, scalable, and flexible, but also implemented at a low cost comparable to that of existing access solutions in order to increase the economic viability of many potential high-bandwidth applications. In recent years, optical access networks have been receiving tremendous attention from both academia and industry. A large number of research activities have been carried out or are now underway this hot area. The purpose of this feature issue is to expose the networking community to the latest research breakthroughs and progresses in the area of optical access networks.
Measurement of DNA translocation dynamics in a solid-state nanopore at 100-ns temporal resolution
Shekar, Siddharth; Niedzwiecki, David J.; Chien, Chen-Chi; Ong, Peijie; Fleischer, Daniel A.; Lin, Jianxun; Rosenstein, Jacob K.; Drndic, Marija; Shepard, Kenneth L.
2017-01-01
Despite the potential for nanopores to be a platform for high-bandwidth study of single-molecule systems, ionic current measurements through nanopores have been limited in their temporal resolution by noise arising from poorly optimized measurement electronics and large parasitic capacitances in the nanopore membranes. Here, we present a complementary metal-oxide-semiconductor (CMOS) nanopore (CNP) amplifier capable of low noise recordings at an unprecedented 10 MHz bandwidth. When integrated with state-of-the-art solid-state nanopores in silicon nitride membranes, we achieve an SNR of greater than 10 for ssDNA translocations at a measurement bandwidth of 5 MHz, which represents the fastest ion current recordings through nanopores reported to date. We observe transient features in ssDNA translocation events that are as short as 200 ns, which are hidden even at bandwidths as high as 1 MHz. These features offer further insights into the translocation kinetics of molecules entering and exiting the pore. This platform highlights the advantages of high-bandwidth translocation measurements made possible by integrating nanopores and custom-designed electronics. PMID:27332998
Acoustic transient classification with a template correlation processor.
Edwards, R T
1999-10-01
I present an architecture for acoustic pattern classification using trinary-trinary template correlation. In spite of its computational simplicity, the algorithm and architecture represent a method which greatly reduces bandwidth of the input, storage requirements of the classifier memory, and power consumption of the system without compromising classification accuracy. The linear system should be amenable to training using recently-developed methods such as Independent Component Analysis (ICA), and we predict that behavior will be qualitatively similar to that of structures in the auditory cortex.
The Effects of Block Size on the Performance of Coherent Caches in Shared-Memory Multiprocessors
1993-05-01
increase with the bandwidth and latency. For those applications with poor spatial locality, the best choice of cache line size is determined by the...observation was used in the design of two schemes: LimitLESS di- rectories and Tag caches. LimitLESS directories [15] were designed for the ALEWIFE...small packets may be used to avoid network congestion. The most important factor influencing the choice of cache line size for a multipro- cessor is the
Intersatellite communications optoelectronics research at the Goddard Space Flight Center
NASA Technical Reports Server (NTRS)
Krainak, Michael A.
1992-01-01
A review is presented of current optoelectronics research and development at the NASA Goddard Space Flight Center for high-power, high-bandwidth laser transmitters; high-bandwidth, high-sensitivity optical receivers; pointing, acquisition, and tracking components; and experimental and theoretical system modeling at the NASA Goddard Space Flight Center. Program hardware and space flight opportunities are presented.
2012-06-15
Microactuators of High –Speed Flow Control”, AIAA- 2938 , 2011. 12. Kreth, P., Solomon, J.T., Alvi, F.S., “Resonance-Enhanced High Frequency Micro...paper 2938 , 2011. 34. Ali, M.Y., Solomon, J.T., Gustavsson, J., Kumar, R., Alvi, F.S., “Control of Supersonic Cavity Flows Using High Bandwidth Micro
Quick Vegas: Improving Performance of TCP Vegas for High Bandwidth-Delay Product Networks
NASA Astrophysics Data System (ADS)
Chan, Yi-Cheng; Lin, Chia-Liang; Ho, Cheng-Yuan
An important issue in designing a TCP congestion control algorithm is that it should allow the protocol to quickly adjust the end-to-end communication rate to the bandwidth on the bottleneck link. However, the TCP congestion control may function poorly in high bandwidth-delay product networks because of its slow response with large congestion windows. In this paper, we propose an enhanced version of TCP Vegas called Quick Vegas, in which we present an efficient congestion window control algorithm for a TCP source. Our algorithm improves the slow-start and congestion avoidance techniques of original Vegas. Simulation results show that Quick Vegas significantly improves the performance of connections as well as remaining fair when the bandwidth-delay product increases.
Development of Next Generation Memory Test Experiment for Deployment on a Small Satellite
NASA Technical Reports Server (NTRS)
MacLeod, Todd; Ho, Fat D.
2012-01-01
The original Memory Test Experiment successfully flew on the FASTSAT satellite launched in November 2010. It contained a single Ramtron 512K ferroelectric memory. The memory device went through many thousands of read/write cycles and recorded any errors that were encountered. The original mission length was schedule to last 6 months but was extended to 18 months. New opportunities exist to launch a similar satellite and considerations for a new memory test experiment should be examined. The original experiment had to be designed and integrated in less than two months, so the experiment was a simple design using readily available parts. The follow-on experiment needs to be more sophisticated and encompass more technologies. This paper lays out the considerations for the design and development of this follow-on flight memory experiment. It also details the results from the original Memory Test Experiment that flew on board FASTSAT. Some of the design considerations for the new experiment include the number and type of memory devices to be used, the kinds of tests that will be performed, other data needed to analyze the results, and best use of limited resources on a small satellite. The memory technologies that are considered are FRAM, FLASH, SONOS, Resistive Memory, Phase Change Memory, Nano-wire Memory, Magneto-resistive Memory, Standard DRAM, and Standard SRAM. The kinds of tests that could be performed are read/write operations, non-volatile memory retention, write cycle endurance, power measurements, and testing Error Detection and Correction schemes. Other data that may help analyze the results are GPS location of recorded errors, time stamp of all data recorded, radiation measurements, temperature, and other activities being perform by the satellite. The resources of power, volume, mass, temperature, processing power, and telemetry bandwidth are extremely limited on a small satellite. Design considerations must be made to allow the experiment to not interfere with the satellite s primary mission.
Achieving High Performance With TCP Over 40 GbE on NUMA Architectures for CMS Data Acquisition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bawej, Tomasz; et al.
2014-01-01
TCP and the socket abstraction have barely changed over the last two decades, but at the network layer there has been a giant leap from a few megabits to 100 gigabits in bandwidth. At the same time, CPU architectures have evolved into the multicore era and applications are expected to make full use of all available resources. Applications in the data acquisition domain based on the standard socket library running in a Non-Uniform Memory Access (NUMA) architecture are unable to reach full efficiency and scalability without the software being adequately aware about the IRQ (Interrupt Request), CPU and memory affinities.more » During the first long shutdown of LHC, the CMS DAQ system is going to be upgraded for operation from 2015 onwards and a new software component has been designed and developed in the CMS online framework for transferring data with sockets. This software attempts to wrap the low-level socket library to ease higher-level programming with an API based on an asynchronous event driven model similar to the DAT uDAPL API. It is an event-based application with NUMA optimizations, that allows for a high throughput of data across a large distributed system. This paper describes the architecture, the technologies involved and the performance measurements of the software in the context of the CMS distributed event building.« less
ERIC Educational Resources Information Center
Villano, Matt
2009-01-01
Video-heavy distance learning programs can put a strain on the campus network. This article describes how three institutions are managing bandwidth to ensure high-quality service for eLearning students.
Liu, Chuanbao; Bai, Yang; Zhao, Qian; Yang, Yihao; Chen, Hongsheng; Zhou, Ji; Qiao, Lijie
2016-01-01
Metasurfaces have powerful abilities to manipulate the properties of electromagnetic waves flexibly, especially the modulation of polarization state for both linearly polarized (LP) and circularly polarized (CP) waves. However, the transmission efficiency of cross-polarization conversion by a single-layer metasurface has a low theoretical upper limit of 25% and the bandwidth is usually narrow, which cannot be resolved by their simple additions. Here, we efficiently manipulate polarization coupling in multilayer metasurface to promote the transmission of cross-polarization by Fabry-Perot resonance, so that a high conversion coefficient of 80–90% of CP wave is achieved within a broad bandwidth in the metasurface with C-shaped scatters by theoretical calculation, numerical simulation and experiments. Further, fully controlling Pancharatnam-Berry phase enables to realize polarized beam splitter, which is demonstrated to produce abnormal transmission with high conversion efficiency and broad bandwidth. PMID:27703254
NASA Astrophysics Data System (ADS)
Li, Hao; Liu, Wenzhong; Zhang, Hao F.
2015-10-01
Rodent models are indispensable in studying various retinal diseases. Noninvasive, high-resolution retinal imaging of rodent models is highly desired for longitudinally investigating the pathogenesis and therapeutic strategies. However, due to severe aberrations, the retinal image quality in rodents can be much worse than that in humans. We numerically and experimentally investigated the influence of chromatic aberration and optical illumination bandwidth on retinal imaging. We confirmed that the rat retinal image quality decreased with increasing illumination bandwidth. We achieved the retinal image resolution of 10 μm using a 19 nm illumination bandwidth centered at 580 nm in a home-built fundus camera. Furthermore, we observed higher chromatic aberration in albino rat eyes than in pigmented rat eyes. This study provides a design guide for high-resolution fundus camera for rodents. Our method is also beneficial to dispersion compensation in multiwavelength retinal imaging applications.
Power and Efficiency Optimized in Traveling-Wave Tubes Over a Broad Frequency Bandwidth
NASA Technical Reports Server (NTRS)
Wilson, Jeffrey D.
2001-01-01
A traveling-wave tube (TWT) is an electron beam device that is used to amplify electromagnetic communication waves at radio and microwave frequencies. TWT's are critical components in deep space probes, communication satellites, and high-power radar systems. Power conversion efficiency is of paramount importance for TWT's employed in deep space probes and communication satellites. A previous effort was very successful in increasing efficiency and power at a single frequency (ref. 1). Such an algorithm is sufficient for narrow bandwidth designs, but for optimal designs in applications that require high radiofrequency power over a wide bandwidth, such as high-density communications or high-resolution radar, the variation of the circuit response with respect to frequency must be considered. This work at the NASA Glenn Research Center is the first to develop techniques for optimizing TWT efficiency and output power over a broad frequency bandwidth (ref. 2). The techniques are based on simulated annealing, which has the advantage over conventional optimization techniques in that it enables the best possible solution to be obtained (ref. 3). Two new broadband simulated annealing algorithms were developed that optimize (1) minimum saturated power efficiency over a frequency bandwidth and (2) simultaneous bandwidth and minimum power efficiency over the frequency band with constant input power. The algorithms were incorporated into the NASA coupled-cavity TWT computer model (ref. 4) and used to design optimal phase velocity tapers using the 59- to 64-GHz Hughes 961HA coupled-cavity TWT as a baseline model. In comparison to the baseline design, the computational results of the first broad-band design algorithm show an improvement of 73.9 percent in minimum saturated efficiency (see the top graph). The second broadband design algorithm (see the bottom graph) improves minimum radiofrequency efficiency with constant input power drive by a factor of 2.7 at the high band edge (64 GHz) and increases simultaneous bandwidth by 500 MHz.
Lossy Wavefield Compression for Full-Waveform Inversion
NASA Astrophysics Data System (ADS)
Boehm, C.; Fichtner, A.; de la Puente, J.; Hanzich, M.
2015-12-01
We present lossy compression techniques, tailored to the inexact computation of sensitivity kernels, that significantly reduce the memory requirements of adjoint-based minimization schemes. Adjoint methods are a powerful tool to solve tomography problems in full-waveform inversion (FWI). Yet they face the challenge of massive memory requirements caused by the opposite directions of forward and adjoint simulations and the necessity to access both wavefields simultaneously during the computation of the sensitivity kernel. Thus, storage, I/O operations, and memory bandwidth become key topics in FWI. In this talk, we present strategies for the temporal and spatial compression of the forward wavefield. This comprises re-interpolation with coarse time steps and an adaptive polynomial degree of the spectral element shape functions. In addition, we predict the projection errors on a hierarchy of grids and re-quantize the residuals with an adaptive floating-point accuracy to improve the approximation. Furthermore, we use the first arrivals of adjoint waves to identify "shadow zones" that do not contribute to the sensitivity kernel at all. Updating and storing the wavefield within these shadow zones is skipped, which reduces memory requirements and computational costs at the same time. Compared to check-pointing, our approach has only a negligible computational overhead, utilizing the fact that a sufficiently accurate sensitivity kernel does not require a fully resolved forward wavefield. Furthermore, we use adaptive compression thresholds during the FWI iterations to ensure convergence. Numerical experiments on the reservoir scale and for the Western Mediterranean prove the high potential of this approach with an effective compression factor of 500-1000. Furthermore, it is computationally cheap and easy to integrate in both, finite-differences and finite-element wave propagation codes.
Reducing the computational footprint for real-time BCPNN learning
Vogginger, Bernhard; Schüffny, René; Lansner, Anders; Cederström, Love; Partzsch, Johannes; Höppner, Sebastian
2015-01-01
The implementation of synaptic plasticity in neural simulation or neuromorphic hardware is usually very resource-intensive, often requiring a compromise between efficiency and flexibility. A versatile, but computationally-expensive plasticity mechanism is provided by the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm. Building upon Bayesian statistics, and having clear links to biological plasticity processes, the BCPNN learning rule has been applied in many fields, ranging from data classification, associative memory, reward-based learning, probabilistic inference to cortical attractor memory networks. In the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-pass-filtering stages, requiring a total of eight state variables, whose dynamics are typically simulated with the fixed step size Euler method. We derive analytic solutions allowing an efficient event-driven implementation of this learning rule. Further speedup is achieved by first rewriting the model which reduces the number of basic arithmetic operations per update to one half, and second by using look-up tables for the frequently calculated exponential decay. Ultimately, in a typical use case, the simulation using our approach is more than one order of magnitude faster than with the fixed step size Euler method. Aiming for a small memory footprint per BCPNN synapse, we also evaluate the use of fixed-point numbers for the state variables, and assess the number of bits required to achieve same or better accuracy than with the conventional explicit Euler method. All of this will allow a real-time simulation of a reduced cortex model based on BCPNN in high performance computing. More important, with the analytic solution at hand and due to the reduced memory bandwidth, the learning rule can be efficiently implemented in dedicated or existing digital neuromorphic hardware. PMID:25657618
Reducing the computational footprint for real-time BCPNN learning.
Vogginger, Bernhard; Schüffny, René; Lansner, Anders; Cederström, Love; Partzsch, Johannes; Höppner, Sebastian
2015-01-01
The implementation of synaptic plasticity in neural simulation or neuromorphic hardware is usually very resource-intensive, often requiring a compromise between efficiency and flexibility. A versatile, but computationally-expensive plasticity mechanism is provided by the Bayesian Confidence Propagation Neural Network (BCPNN) paradigm. Building upon Bayesian statistics, and having clear links to biological plasticity processes, the BCPNN learning rule has been applied in many fields, ranging from data classification, associative memory, reward-based learning, probabilistic inference to cortical attractor memory networks. In the spike-based version of this learning rule the pre-, postsynaptic and coincident activity is traced in three low-pass-filtering stages, requiring a total of eight state variables, whose dynamics are typically simulated with the fixed step size Euler method. We derive analytic solutions allowing an efficient event-driven implementation of this learning rule. Further speedup is achieved by first rewriting the model which reduces the number of basic arithmetic operations per update to one half, and second by using look-up tables for the frequently calculated exponential decay. Ultimately, in a typical use case, the simulation using our approach is more than one order of magnitude faster than with the fixed step size Euler method. Aiming for a small memory footprint per BCPNN synapse, we also evaluate the use of fixed-point numbers for the state variables, and assess the number of bits required to achieve same or better accuracy than with the conventional explicit Euler method. All of this will allow a real-time simulation of a reduced cortex model based on BCPNN in high performance computing. More important, with the analytic solution at hand and due to the reduced memory bandwidth, the learning rule can be efficiently implemented in dedicated or existing digital neuromorphic hardware.
Flight control system design factors for applying automated testing techniques
NASA Technical Reports Server (NTRS)
Sitz, Joel R.; Vernon, Todd H.
1990-01-01
The principal design features and operational experiences of the X-29 forward-swept-wing aircraft and F-18 high alpha research vehicle (HARV) automated test systems are discussed. It is noted that operational experiences in developing and using these automated testing techniques have highlighted the need for incorporating target system features to improve testability. Improved target system testability can be accomplished with the addition of nonreal-time and real-time features. Online access to target system implementation details, unobtrusive real-time access to internal user-selectable variables, and proper software instrumentation are all desirable features of the target system. Also, test system and target system design issues must be addressed during the early stages of the target system development. Processing speeds of up to 20 million instructions/s and the development of high-bandwidth reflective memory systems have improved the ability to integrate the target system and test system for the application of automated testing techniques. It is concluded that new methods of designing testability into the target systems are required.
Hur, M. S.; Ersfeld, B.; Noble, A.; Suk, H.; Jaroszynski, D. A.
2017-01-01
Ultra-intense, narrow-bandwidth, electromagnetic pulses have become important tools for exploring the characteristics of matter. Modern tuneable high-power light sources, such as free-electron lasers and vacuum tubes, rely on bunching of relativistic or near-relativistic electrons in vacuum. Here we present a fundamentally different method for producing narrow-bandwidth radiation from a broad spectral bandwidth current source, which takes advantage of the inflated radiation impedance close to cut-off in a medium with a plasma-like permittivity. We find that by embedding a current source in this cut-off region, more than an order of magnitude enhancement of the radiation intensity is obtained compared with emission directly into free space. The method suggests a simple and general way to flexibly use broadband current sources to produce broad or narrow bandwidth pulses. As an example, we demonstrate, using particle-in-cell simulations, enhanced monochromatic emission of terahertz radiation using a two-colour pumped current source enclosed by a tapered waveguide. PMID:28071681
Performance of a High-Concentration Erbium-Doped Fiber Amplifier with 100 nm Amplification Bandwidth
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hajireza, P.; Shahabuddin, N. S.; Abbasi-Zargaleh, S.
2010-07-07
Increasing demand for higher bandwidth has driven the need for higher Wavelength Division Multiplexing (WDM) channels. One of the requirements to achieve this is a broadband amplifier. This paper reports the performance of a broadband, compact, high-concentration and silica-based erbium-doped fiber amplifier. The amplifier optimized to a 2.15 m long erbium-doped fiber with erbium ion concentration of 2000 ppm. The gain spectrum of the amplifier has a measured amplification bandwidth of 100 nm using a 980 nm laser diode with power of 150 mW. This silica-based EDFA shows lower noise figure, higher gain and wider bandwidth in shorter wavelengths comparedmore » to Bismuth-based EDFA with higher erbium ion concentration of 3250 ppm at equivalent EDF length. The silica-based EDF shows peak gain at 22 dB and amplification bandwidth between 1520 nm and 1620 nm. The lowest noise figure is 5 dB. The gain is further improved with the implementation of enhanced EDFA configurations.« less
NASA Astrophysics Data System (ADS)
Hur, M. S.; Ersfeld, B.; Noble, A.; Suk, H.; Jaroszynski, D. A.
2017-01-01
Ultra-intense, narrow-bandwidth, electromagnetic pulses have become important tools for exploring the characteristics of matter. Modern tuneable high-power light sources, such as free-electron lasers and vacuum tubes, rely on bunching of relativistic or near-relativistic electrons in vacuum. Here we present a fundamentally different method for producing narrow-bandwidth radiation from a broad spectral bandwidth current source, which takes advantage of the inflated radiation impedance close to cut-off in a medium with a plasma-like permittivity. We find that by embedding a current source in this cut-off region, more than an order of magnitude enhancement of the radiation intensity is obtained compared with emission directly into free space. The method suggests a simple and general way to flexibly use broadband current sources to produce broad or narrow bandwidth pulses. As an example, we demonstrate, using particle-in-cell simulations, enhanced monochromatic emission of terahertz radiation using a two-colour pumped current source enclosed by a tapered waveguide.
170 GHz Uni-Traveling Carrier Photodiodes for InP-based photonic integrated circuits.
Rouvalis, E; Chtioui, M; van Dijk, F; Lelarge, F; Fice, M J; Renaud, C C; Carpintero, G; Seeds, A J
2012-08-27
We demonstrate the capability of fabricating extremely high-bandwidth Uni-Traveling Carrier Photodiodes (UTC-PDs) using techniques that are suitable for active-passive monolithic integration with Multiple Quantum Well (MQW)-based photonic devices. The devices achieved a responsivity of 0.27 A/W, a 3-dB bandwidth of 170 GHz, and an output power of -9 dBm at 200 GHz. We anticipate that this work will deliver Photonic Integrated Circuits with extremely high bandwidth for optical communications and millimetre-wave applications.
Design of a 0.13 µm SiGe Limiting Amplifier with 14.6 THz Gain-Bandwidth-Product
NASA Astrophysics Data System (ADS)
Park, Sehoon; Du, Xuan-Quang; Grözing, Markus; Berroth, Manfred
2017-09-01
This paper presents the design of a limiting amplifier with 1-to-3 fan-out implementation in a 0.13 µm SiGe BiCMOS technology and gives a detailed guideline to determine the circuit parameters of the amplifier for optimum high-frequency performance based on simplified gain estimations. The proposed design uses a Cherry-Hooper topology for bandwidth enhancement and is optimized for maximum group delay flatness to minimize phase distortion of the input signal. With regard to a high integration density and a small chip area, the design employs no passive inductors which might be used to boost the circuit bandwidth with inductive peaking. On a RLC-extracted post-layout simulation level, the limiting amplifier exhibits a gain-bandwidth-product of 14.6 THz with 56.6 dB voltage gain and 21.5 GHz 3 dB bandwidth at a peak-to-peak input voltage of 1.5 mV. The group delay variation within the 3 dB bandwidth is less than 0.5 ps and the power dissipation at a power supply voltage of 3 V including output drivers is 837 mW.
NASA Astrophysics Data System (ADS)
Hartman, Richard V.
1987-02-01
Advances in sophisticated algorithms and parallel VLSI processing have resulted in the capability for near real-time transmission of television pictures (optical and FLIR) via existing telephone lines, tactical radios, and military satellite channels. Concepts have been field demonstrated with production ready engineering development models using transform compression techniques. Preliminary design has been completed for packaging an existing command post version into a 20 pound 1/2 ATR enclosure for use on jeeps, backpacks, RPVs, helicopters, and reconnaissance aircraft. The system will also have a built-in error correction code 2 (ECC) unit, allowing operation via communicatons media exhibiting a bit error rate of 1 X 10-or better. In the past several years, two nearly simultaneous developments show promise of allowing the breakthrough needed to give the operational commander a practical means for obtaining pictorial information from the battlefield. And, he can obtain this information in near real time using available communications channels--his long sought after pictorial force multiplier: • High speed digital integrated circuitry that is affordable, and • An understanding of the practical applications of information theory. High speed digital integrated circuits allow an analog television picture to be nearly instantaneously converted to a digital serial bit stream so that it can be transmitted as rapidly or slowly as desired, depending on the available transmission channel bandwidth. Perhaps more importantly, digitizing the picture allows it to be stored and processed in a number of ways. Most typically, processing is performed to reduce the amount of data that must be transmitted, while still maintaining maximum picture quality. Reducing the amount of data that must be transmitted is important since it allows a narrower bandwidth in the scarce frequency spectrum to be used for transmission of pictures, or if only a narrow bandwidth is available, it takes less time for the picture to be transmitted. This process of reducing the amount of data that must be transmitted to represent a picture is called compression, truncation, or most typically, video compression. Keep in mind that the pictures you see on your home TV are nothing more than a series of still pictures displayed at a rate of 30 frames per second. If you grabbed one of those frames, digitized it, stored it in memory, and then transmitted it at the most rapid rate the bandwidth of your communications channel would allow, you would be using the so-called slow scan techniques.
Megahertz-resolution programmable microwave shaper.
Li, Jilong; Dai, Yitang; Yin, Feifei; Li, Wei; Li, Ming; Chen, Hongwei; Xu, Kun
2018-04-15
A novel microwave shaper is proposed and demonstrated, of which the microwave spectral transfer function could be fully programmable with high resolution. We achieve this by bandwidth-compressed mapping a programmable optical wave-shaper, which has a lower frequency resolution of tens of gigahertz, to a microwave one with resolution of tens of megahertz. This is based on a novel technology of "bandwidth scaling," which employs bandwidth-stretched electronic-to-optical conversion and bandwidth-compressed optical-to-electronic conversion. We demonstrate the high resolution and full reconfigurability experimentally. Furthermore, we show the group delay variation could be greatly enlarged after mapping; this is then verified by the experiment with an enlargement of 194 times. The resolution improvement and group delay magnification significantly distinguish our proposal from previous optics-to-microwave spectrum mapping.
Two-dimensional priority-based dynamic resource allocation algorithm for QoS in WDM/TDM PON networks
NASA Astrophysics Data System (ADS)
Sun, Yixin; Liu, Bo; Zhang, Lijia; Xin, Xiangjun; Zhang, Qi; Rao, Lan
2018-01-01
Wavelength division multiplexing/time division multiplexing (WDM/TDM) passive optical networks (PON) is being viewed as a promising solution for delivering multiple services and applications. The hybrid WDM / TDM PON uses the wavelength and bandwidth allocation strategy to control the distribution of the wavelength channels in the uplink direction, so that it can ensure the high bandwidth requirements of multiple Optical Network Units (ONUs) while improving the wavelength resource utilization. Through the investigation of the presented dynamic bandwidth allocation algorithms, these algorithms can't satisfy the requirements of different levels of service very well while adapting to the structural characteristics of mixed WDM / TDM PON system. This paper introduces a novel wavelength and bandwidth allocation algorithm to efficiently utilize the bandwidth and support QoS (Quality of Service) guarantees in WDM/TDM PON. Two priority based polling subcycles are introduced in order to increase system efficiency and improve system performance. The fixed priority polling subcycle and dynamic priority polling subcycle follow different principles to implement wavelength and bandwidth allocation according to the priority of different levels of service. A simulation was conducted to study the performance of the priority based polling in dynamic resource allocation algorithm in WDM/TDM PON. The results show that the performance of delay-sensitive services is greatly improved without degrading QoS guarantees for other services. Compared with the traditional dynamic bandwidth allocation algorithms, this algorithm can meet bandwidth needs of different priority traffic class, achieve low loss rate performance, and ensure real-time of high priority traffic class in terms of overall traffic on the network.
NASA Astrophysics Data System (ADS)
Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico
2018-04-01
Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
2008-12-01
In future network-centric warfare environments, teams of autonomous vehicles will be deployed in a coorperative manner to conduct wide-area...of data back to the command station, autonomous vehicles configured with high bandwidth communication system are positioned between the command
Hennig, Georg; Brittenham, Gary M; Sroka, Ronald; Kniebühler, Gesa; Vogeser, Michael; Stepp, Herbert
2013-04-01
An optical filter unit is demonstrated, which uses two successively arranged tunable thin-film optical band-pass filters and allows for simultaneous adjustment of the central wavelength in the spectral range 522-555 nm and of the spectral bandwidth in the range 3-16 nm with a wavelength switching time of 8 ms∕nm. Different spectral filter combinations can cover the complete visible spectral range. The transmitted intensity was found to decrease only linearly with the spectral bandwidth for bandwidths >6 nm, allowing a high maximum transmission efficiency of >75%. The image of a fiber bundle was spectrally filtered and analyzed in terms of position-dependency of the transmitted bandwidth and central wavelength.
NASA Technical Reports Server (NTRS)
Ivancic, William D.
2002-01-01
Transmission control protocol (TCP) was conceived and designed to run over a variety of communication links, including wireless and high-bandwidth links. However, with recent technological advances in satellite and fiber-optic networks, researchers are reevaluating the flexibility of TCP. The TCP pacing and packet pair probing implementation may help overcome two of the major obstacles identified for efficient bandwidth utilization over communication links with large delay-bandwidth products.
A 0.4-2.3 GHz broadband power amplifier extended continuous class-F design technology
NASA Astrophysics Data System (ADS)
Chen, Peng; He, Songbai
2015-08-01
A 0.4-2.3 GHz broadband power amplifier (PA) extended continuous class-F design technology is proposed in this paper. Traditional continuous class-F PA performs in high-efficiency only in one octave bandwidth. With the increasing development of wireless communication, the PA is in demand to cover the mainstream communication standards' working frequencies from 0.4 GHz to 2.2 GHz. In order to achieve this objective, the bandwidths of class-F and continuous class-F PA are analysed and discussed by Fourier series. Also, two criteria, which could reduce the continuous class-F PA's implementation complexity, are presented and explained to investigate the overlapping area of the transistor's current and voltage waveforms. The proposed PA design technology is based on the continuous class-F design method and divides the bandwidth into two parts: the first part covers the bandwidth from 1.3 GHz to 2.3 GHz, where the impedances are designed by the continuous class-F method; the other part covers the bandwidth from 0.4 GHz to 1.3 GHz, where the impedance to guarantee PA to be in high-efficiency over this bandwidth is selected and controlled. The improved particle swarm optimisation is employed for realising the multi-impedances of output and input network. A PA based on a commercial 10 W GaN high electron mobility transistor is designed and fabricated to verify the proposed design method. The simulation and measurement results show that the proposed PA could deliver 40-76% power added efficiency and more than 11 dB power gain with more than 40 dBm output power over the bandwidth from 0.4-2.3 GHz.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek
2016-10-06
We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Blocksome, Michael A
2014-04-01
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Blocksome, Michael A
2014-04-22
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
Low latency, high bandwidth data communications between compute nodes in a parallel computer
Blocksome, Michael A
2013-07-02
Methods, systems, and products are disclosed for data transfers between nodes in a parallel computer that include: receiving, by an origin DMA on an origin node, a buffer identifier for a buffer containing data for transfer to a target node; sending, by the origin DMA to the target node, a RTS message; transferring, by the origin DMA, a data portion to the target node using a memory FIFO operation that specifies one end of the buffer from which to begin transferring the data; receiving, by the origin DMA, an acknowledgement of the RTS message from the target node; and transferring, by the origin DMA in response to receiving the acknowledgement, any remaining data portion to the target node using a direct put operation that specifies the other end of the buffer from which to begin transferring the data, including initiating the direct put operation without invoking an origin processing core.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton III, Thomas J
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation, MPI_Comm_shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This paper presents and compares two novel failure detection and consensus algorithms. The proposed algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The proposed algorithms were implementedmore » and tested using the Extreme-scale Simulator. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus.« less
Brennan, Marc A.; McCreery, Ryan; Kopun, Judy; Hoover, Brenda; Alexander, Joshua; Lewis, Dawna; Stelmachowicz, Patricia G.
2014-01-01
Background Preference for speech and music processed with nonlinear frequency compression and two controls (restricted and extended bandwidth hearing-aid processing) was examined in adults and children with hearing loss. Purpose Determine if stimulus type (music, sentences), age (children, adults) and degree of hearing loss influence listener preference for nonlinear frequency compression, restricted bandwidth and extended bandwidth. Research Design Within-subject, quasi-experimental study. Using a round-robin procedure, participants listened to amplified stimuli that were 1) frequency-lowered using nonlinear frequency compression, 2) low-pass filtered at 5 kHz to simulate the restricted bandwidth of conventional hearing aid processing, or 3) low-pass filtered at 11 kHz to simulate extended bandwidth amplification. The examiner and participants were blinded to the type of processing. Using a two-alternative forced-choice task, participants selected the preferred music or sentence passage. Study Sample Sixteen children (8–16 years) and 16 adults (19–65 years) with mild-to-severe sensorineural hearing loss. Intervention All subjects listened to speech and music processed using a hearing-aid simulator fit to the Desired Sensation Level algorithm v.5.0a (Scollie et al, 2005). Results Children and adults did not differ in their preferences. For speech, participants preferred extended bandwidth to both nonlinear frequency compression and restricted bandwidth. Participants also preferred nonlinear frequency compression to restricted bandwidth. Preference was not related to degree of hearing loss. For music, listeners did not show a preference. However, participants with greater hearing loss preferred nonlinear frequency compression to restricted bandwidth more than participants with less hearing loss. Conversely, participants with greater hearing loss were less likely to prefer extended bandwidth to restricted bandwidth. Conclusion Both age groups preferred access to high frequency sounds, as demonstrated by their preference for either the extended bandwidth or nonlinear frequency compression conditions over the restricted bandwidth condition. Preference for extended bandwidth can be limited for those with greater degrees of hearing loss, but participants with greater hearing loss may be more likely to prefer nonlinear frequency compression. Further investigation using participants with more severe hearing loss may be warranted. PMID:25514451
The effect of bandwidth on filter instrument total ozone accuracy
NASA Technical Reports Server (NTRS)
Basher, R. E.
1977-01-01
The effect of the width and shape of the New Zealand filter instrument's passbands on measured total-ozone accuracy is determined using a numerical model of the spectral measurement process. The model enables the calculation of corrections for the 'bandwidth-effect' error and shows that highly attenuating passband skirts and well-suppressed leakage bands are at least as important as narrow half-bandwidths. Over typical ranges of airmass and total ozone, the range in the bandwidth-effect correction is about 2% in total ozone for the filter instrument, compared with about 1% for the Dobson instrument.
Modulated method for efficient, narrow-bandwidth, laser Compton X-ray and gamma-ray sources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barty, Christopher P. J.
A method of x-ray and gamma-ray generation via laser Compton scattering uses the interaction of a specially-formatted, highly modulated, long duration, laser pulse with a high-frequency train of high-brightness electron bunches to both create narrow bandwidth x-ray and gamma-ray sources and significantly increase the laser to Compton photon conversion efficiency.
Method for efficient, narrow-bandwidth, laser compton x-ray and gamma-ray sources
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barty, Christopher P. J.
A method of x-ray and gamma-ray generation via laser Compton scattering uses the interaction of a specially-formatted, highly modulated, long duration, laser pulse with a high-frequency train of high-brightness electron bunches to both create narrow bandwidth x-ray and gamma-ray sources and significantly increase the laser to Compton photon conversion efficiency.
NASA Astrophysics Data System (ADS)
Sheen, David M.; Fernandes, Justin L.; Tedeschi, Jonathan R.; McMakin, Douglas L.; Jones, A. Mark; Lechelt, Wayne M.; Severtsen, Ronald H.
2013-05-01
Active millimeter-wave imaging is currently being used for personnel screening at airports and other high-security facilities. The cylindrical imaging techniques used in the deployed systems are based on licensed technology developed at the Pacific Northwest National Laboratory. The cylindrical and a related planar imaging technique form three-dimensional images by scanning a diverging beam swept frequency transceiver over a two-dimensional aperture and mathematically focusing or reconstructing the data into three-dimensional images of the person being screened. The resolution, clothing penetration, and image illumination quality obtained with these techniques can be significantly enhanced through the selection of the aperture size, antenna beamwidth, center frequency, and bandwidth. The lateral resolution can be improved by increasing the center frequency, or it can be increased with a larger antenna beamwidth. The wide beamwidth approach can significantly improve illumination quality relative to a higher frequency system. Additionally, a wide antenna beamwidth allows for operation at a lower center frequency resulting in less scattering and attenuation from the clothing. The depth resolution of the system can be improved by increasing the bandwidth. Utilization of extremely wide bandwidths of up to 30 GHz can result in depth resolution as fine as 5 mm. This wider bandwidth operation may allow for improved detection techniques based on high range resolution. In this paper, the results of an extensive imaging study that explored the advantages of using extremely wide beamwidth and bandwidth are presented, primarily for 10-40 GHz frequency band.
Temperature and leakage aware techniques to improve cache reliability
NASA Astrophysics Data System (ADS)
Akaaboune, Adil
Decreasing power consumption in small devices such as handhelds, cell phones and high-performance processors is now one of the most critical design concerns. On-chip cache memories dominate the chip area in microprocessors and thus arises the need for power efficient cache memories. Cache is the simplest cost effective method to attain high speed memory hierarchy and, its performance is extremely critical for high speed computers. Cache is used by the microprocessor for channeling the performance gap between processor and main memory (RAM) hence the memory bandwidth is frequently a bottleneck which can affect the peak throughput significantly. In the design of any cache system, the tradeoffs of area/cost, performance, power consumption, and thermal management must be taken into consideration. Previous work has mainly concentrated on performance and area/cost constraints. More recent works have focused on low power design especially for portable devices and media-processing systems, however fewer research has been done on the relationship between heat management, Leakage power and cost per die. Lately, the focus of power dissipation in the new generations of microprocessors has shifted from dynamic power to idle power, a previously underestimated form of power loss that causes battery charge to drain and shutdown too early due the waste of energy. The problem has been aggravated by the aggressive scaling of process; device level method used originally by designers to enhance performance, conserve dissipation and reduces the sizes of digital circuits that are increasingly condensed. This dissertation studies the impact of hotspots, in the cache memory, on leakage consumption and microprocessor reliability and durability. The work will first prove that by eliminating hotspots in the cache memory, leakage power will be reduced and therefore, the reliability will be improved. The second technique studied is data quality management that improves the quality of the data stored in the cache to reduce power consumption. The initial work done on this subject focuses on the type of data that increases leakage consumption and ways to manage without impacting the performance of the microprocessor. The second phase of the project focuses on managing the data storage in different blocks of the cache to smooth the leakage power as well as dynamic power consumption. The last technique is a voltage controlled cache to reduce the leakage consumption of the cache while in execution and even in idle state. Two blocks of the 4-way set associative cache go through a voltage regulator before getting to the voltage well, and the other two are directly connected to the voltage well. The idea behind this technique is to use the replacement algorithm information to increase or decrease voltage of the two blocks depending on the need of the information stored on them.
An adaptive vector quantization scheme
NASA Technical Reports Server (NTRS)
Cheung, K.-M.
1990-01-01
Vector quantization is known to be an effective compression scheme to achieve a low bit rate so as to minimize communication channel bandwidth and also to reduce digital memory storage while maintaining the necessary fidelity of the data. However, the large number of computations required in vector quantizers has been a handicap in using vector quantization for low-rate source coding. An adaptive vector quantization algorithm is introduced that is inherently suitable for simple hardware implementation because it has a simple architecture. It allows fast encoding and decoding because it requires only addition and subtraction operations.
Time Integrating Optical Signal Processing
1981-07-01
advantage of greatly reducing the bandwidth requirement for the memory feeding the second cell. For a system composed of a PbMoO 4 and a ( TeO2 )s Bragg cell...bounds. ( TeO2 )L and ( TeO2 )s represent, respectively, the long- / , / itudinal and slow shear / modes of TeO2 . ’a , / / /a ’o [ / / / / was assumed here...could be implemented with a 25mm TeO2 device operated in the longitudinal mode in a hybrid system. A purely time-integrating system would require about
Requirements and Usage of NVM in Advanced Onboard Data Processing Systems
NASA Technical Reports Server (NTRS)
Some, R.
2001-01-01
This viewgraph presentation gives an overview of the requirements and uses of non-volatile memory (NVM) in advanced onboard data processing systems. Supercomputing in space presents the only viable approach to the bandwidth problem (can't get data down to Earth), controlling constellations of cooperating satellites, reducing mission operating costs, and real-time intelligent decision making and science data gathering. Details are given on the REE vision and impact on NASA and Department of Defense missions, objectives of REE, baseline architecture, and issues. NVM uses and requirements are listed.
IEEE 802.15.4 ZigBee-Based Time-of-Arrival Estimation for Wireless Sensor Networks.
Cheon, Jeonghyeon; Hwang, Hyunsu; Kim, Dongsun; Jung, Yunho
2016-02-05
Precise time-of-arrival (TOA) estimation is one of the most important techniques in RF-based positioning systems that use wireless sensor networks (WSNs). Because the accuracy of TOA estimation is proportional to the RF signal bandwidth, using broad bandwidth is the most fundamental approach for achieving higher accuracy. Hence, ultra-wide-band (UWB) systems with a bandwidth of 500 MHz are commonly used. However, wireless systems with broad bandwidth suffer from the disadvantages of high complexity and high power consumption. Therefore, it is difficult to employ such systems in various WSN applications. In this paper, we present a precise time-of-arrival (TOA) estimation algorithm using an IEEE 802.15.4 ZigBee system with a narrow bandwidth of 2 MHz. In order to overcome the lack of bandwidth, the proposed algorithm estimates the fractional TOA within the sampling interval. Simulation results show that the proposed TOA estimation algorithm provides an accuracy of 0.5 m at a signal-to-noise ratio (SNR) of 8 dB and achieves an SNR gain of 5 dB as compared with the existing algorithm. In addition, experimental results indicate that the proposed algorithm provides accurate TOA estimation in a real indoor environment.
Spin-torque diode with tunable sensitivity and bandwidth by out-of-plane magnetic field
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, X.; Zheng, C.; Pong, Philip W. T.
Spin-torque diodes based on nanosized magnetic tunnel junctions are novel microwave detectors with high sensitivity and wide frequency bandwidth. While previous reports mainly focus on improving the sensitivity, the approaches to extend the bandwidth are limited. This work experimentally demonstrates that through optimizing the orientation of the external magnetic field, wide bandwidth can be achieved while maintaining high sensitivity. The mechanism of the frequency- and sensitivity-tuning is investigated through analyzing the dependence of resonant frequency and DC voltage on the magnitude and the tilt angle of hard-plane magnetic field. The frequency dependence is qualitatively explicated by Kittel's ferromagnetic resonance model.more » The asymmetric resonant frequency at positive and negative magnetic field is verified by the numerical simulation considering the in-plane anisotropy. The DC voltage dependence is interpreted through evaluating the misalignment angle between the magnetization of the free layer and the reference layer. The tunability of the detector performance by the magnetic field angle is evaluated through characterizing the sensitivity and bandwidth under 3D magnetic field. The frequency bandwidth up to 9.8 GHz or maximum sensitivity up to 154 mV/mW (after impedance mismatch correction) can be achieved by tuning the angle of the applied magnetic field. The results show that the bandwidth and sensitivity can be controlled and adjusted through optimizing the orientation of the magnetic field for various applications and requirements.« less
Multi-granularity Bandwidth Allocation for Large-Scale WDM/TDM PON
NASA Astrophysics Data System (ADS)
Gao, Ziyue; Gan, Chaoqin; Ni, Cuiping; Shi, Qiongling
2017-12-01
WDM (wavelength-division multiplexing)/TDM (time-division multiplexing) PON (passive optical network) is being viewed as a promising solution for delivering multiple services and applications, such as high-definition video, video conference and data traffic. Considering the real-time transmission, QoS (quality of services) requirements and differentiated services model, a multi-granularity dynamic bandwidth allocation (DBA) in both domains of wavelengths and time for large-scale hybrid WDM/TDM PON is proposed in this paper. The proposed scheme achieves load balance by using the bandwidth prediction. Based on the bandwidth prediction, the wavelength assignment can be realized fairly and effectively to satisfy the different demands of various classes. Specially, the allocation of residual bandwidth further augments the DBA and makes full use of bandwidth resources in the network. To further improve the network performance, two schemes named extending the cycle of one free wavelength (ECoFW) and large bandwidth shrinkage (LBS) are proposed, which can prevent transmission from interruption when the user employs more than one wavelength. The simulation results show the effectiveness of the proposed scheme.
NASA Astrophysics Data System (ADS)
Hur, Min Young; Verboncoeur, John; Lee, Hae June
2014-10-01
Particle-in-cell (PIC) simulations have high fidelity in the plasma device requiring transient kinetic modeling compared with fluid simulations. It uses less approximation on the plasma kinetics but requires many particles and grids to observe the semantic results. It means that the simulation spends lots of simulation time in proportion to the number of particles. Therefore, PIC simulation needs high performance computing. In this research, a graphic processing unit (GPU) is adopted for high performance computing of PIC simulation for low temperature discharge plasmas. GPUs have many-core processors and high memory bandwidth compared with a central processing unit (CPU). NVIDIA GeForce GPUs were used for the test with hundreds of cores which show cost-effective performance. PIC code algorithm is divided into two modules which are a field solver and a particle mover. The particle mover module is divided into four routines which are named move, boundary, Monte Carlo collision (MCC), and deposit. Overall, the GPU code solves particle motions as well as electrostatic potential in two-dimensional geometry almost 30 times faster than a single CPU code. This work was supported by the Korea Institute of Science Technology Information.
NEW EPICS/RTEMS IOC BASED ON ALTERA SOC AT JEFFERSON LAB
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yan, Jianxun; Seaton, Chad; Allison, Trent L.
A new EPICS/RTEMS IOC based on the Altera System-on-Chip (SoC) FPGA is being designed at Jefferson Lab. The Altera SoC FPGA integrates a dual ARM Cortex-A9 Hard Processor System (HPS) consisting of processor, peripherals and memory interfaces tied seamlessly with the FPGA fabric using a high-bandwidth interconnect backbone. The embedded Altera SoC IOC has features of remote network boot via U-Boot from SD card or QSPI Flash, 1Gig Ethernet, 1GB DDR3 SDRAM on HPS, UART serial ports, and ISA bus interface. RTEMS for the ARM processor BSP were built with CEXP shell, which will dynamically load the EPICS applications atmore » runtime. U-Boot is the primary bootloader to remotely load the kernel image into local memory from a DHCP/TFTP server over Ethernet, and automatically run RTEMS and EPICS. The first design of the SoC IOC will be compatible with Jefferson Lab’s current PC104 IOCs, which have been running in CEBAF 10 years. The next design would be mounting in a chassis and connected to a daughter card via standard HSMC connectors. This standard SoC IOC will become the next generation of low-level IOC for the accelerator controls at Jefferson Lab.« less
Scalable Analysis Methods and In Situ Infrastructure for Extreme Scale Knowledge Discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duque, Earl P.N.; Whitlock, Brad J.
High performance computers have for many years been on a trajectory that gives them extraordinary compute power with the addition of more and more compute cores. At the same time, other system parameters such as the amount of memory per core and bandwidth to storage have remained constant or have barely increased. This creates an imbalance in the computer, giving it the ability to compute a lot of data that it cannot reasonably save out due to time and storage constraints. While technologies have been invented to mitigate this problem (burst buffers, etc.), software has been adapting to employ inmore » situ libraries which perform data analysis and visualization on simulation data while it is still resident in memory. This avoids the need to ever have to pay the costs of writing many terabytes of data files. Instead, in situ enables the creation of more concentrated data products such as statistics, plots, and data extracts, which are all far smaller than the full-sized volume data. With the increasing popularity of in situ, multiple in situ infrastructures have been created, each with its own mechanism for integrating with a simulation. To make it easier to instrument a simulation with multiple in situ infrastructures and include custom analysis algorithms, this project created the SENSEI framework.« less
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-01-01
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness. PMID:23867746
Lu, Xiaofeng; Song, Li; Shen, Sumin; He, Kang; Yu, Songyu; Ling, Nam
2013-07-17
Hough Transform has been widely used for straight line detection in low-definition and still images, but it suffers from execution time and resource requirements. Field Programmable Gate Arrays (FPGA) provide a competitive alternative for hardware acceleration to reap tremendous computing performance. In this paper, we propose a novel parallel Hough Transform (PHT) and FPGA architecture-associated framework for real-time straight line detection in high-definition videos. A resource-optimized Canny edge detection method with enhanced non-maximum suppression conditions is presented to suppress most possible false edges and obtain more accurate candidate edge pixels for subsequent accelerated computation. Then, a novel PHT algorithm exploiting spatial angle-level parallelism is proposed to upgrade computational accuracy by improving the minimum computational step. Moreover, the FPGA based multi-level pipelined PHT architecture optimized by spatial parallelism ensures real-time computation for 1,024 × 768 resolution videos without any off-chip memory consumption. This framework is evaluated on ALTERA DE2-115 FPGA evaluation platform at a maximum frequency of 200 MHz, and it can calculate straight line parameters in 15.59 ms on the average for one frame. Qualitative and quantitative evaluation results have validated the system performance regarding data throughput, memory bandwidth, resource, speed and robustness.
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU
NASA Astrophysics Data System (ADS)
Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha
2018-03-01
Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
NASA Astrophysics Data System (ADS)
Slatter, Rolf; Goffin, Benoit
2014-08-01
The usage of magnetoresistive (MR) current sensors is increasing steadily in the field of power electronics. Current sensors must not only be accurate and dynamic, but must also be compact and robust. The MR effect is the basis for current sensors with a unique combination of precision and bandwidth in a compact package. A space-qualifiable magnetoresistive current sensor with high accuracy and high bandwidth is being jointly developed by the sensor manufacturer Sensitec and the spacecraft power electronics supplier Thales Alenia Space (T AS) Belgium. Test results for breadboards incorporating commercial-off-the-shelf (COTS) sensors are presented as well as an application example in the electronic control and power unit for the thrust vector actuators of the Ariane5-ME launcher.
Feasibility of optically interconnected parallel processors using wavelength division multiplexing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deri, R.J.; De Groot, A.J.; Haigh, R.E.
1996-03-01
New national security demands require enhanced computing systems for nearly ab initio simulations of extremely complex systems and analyzing unprecedented quantities of remote sensing data. This computational performance is being sought using parallel processing systems, in which many less powerful processors are ganged together to achieve high aggregate performance. Such systems require increased capability to communicate information between individual processor and memory elements. As it is likely that the limited performance of today`s electronic interconnects will prevent the system from achieving its ultimate performance, there is great interest in using fiber optic technology to improve interconnect communication. However, little informationmore » is available to quantify the requirements on fiber optical hardware technology for this application. Furthermore, we have sought to explore interconnect architectures that use the complete communication richness of the optical domain rather than using optics as a simple replacement for electronic interconnects. These considerations have led us to study the performance of a moderate size parallel processor with optical interconnects using multiple optical wavelengths. We quantify the bandwidth, latency, and concurrency requirements which allow a bus-type interconnect to achieve scalable computing performance using up to 256 nodes, each operating at GFLOP performance. Our key conclusion is that scalable performance, to {approx}150 GFLOPS, is achievable for several scientific codes using an optical bus with a small number of WDM channels (8 to 32), only one WDM channel received per node, and achievable optoelectronic bandwidth and latency requirements. 21 refs. , 10 figs.« less
The DAQ needle in the big-data haystack
NASA Astrophysics Data System (ADS)
Meschi, E.
2015-12-01
In the last three decades, HEP experiments have faced the challenge of manipulating larger and larger masses of data from increasingly complex, heterogeneous detectors with millions and then tens of millions of electronic channels. LHC experiments abandoned the monolithic architectures of the nineties in favor of a distributed approach, leveraging the appearence of high speed switched networks developed for digital telecommunication and the internet, and the corresponding increase of memory bandwidth available in off-the-shelf consumer equipment. This led to a generation of experiments where custom electronics triggers, analysing coarser-granularity “fast” data, are confined to the first phase of selection, where predictable latency and real time processing for a modest initial rate reduction are “a necessary evil”. Ever more sophisticated algorithms are projected for use in HL- LHC upgrades, using tracker data in the low-level selection in high multiplicity environments, and requiring extremely complex data interconnects. These systems are quickly obsolete and inflexible but must nonetheless survive and be maintained across the extremely long life span of current detectors. New high-bandwidth bidirectional links could make high-speed low-power full readout at the crossing rate a possibility already in the next decade. At the same time, massively parallel and distributed analysis of unstructured data produced by loosely connected, “intelligent” sources has become ubiquitous in commercial applications, while the mass of persistent data produced by e.g. the LHC experiments has made multiple pass, systematic, end-to-end offline processing increasingly burdensome. A possible evolution of DAQ and trigger architectures could lead to detectors with extremely deep asynchronous or even virtual pipelines, where data streams from the various detector channels are analysed and indexed in situ quasi-real-time using intelligent, pattern-driven data organization, and the final selection is operated as a distributed “search for interesting event parts”. A holistic approach is required to study the potential impact of these different developments on the design of detector readout, trigger and data acquisition systems in the next decades.
Advanced density profile reflectometry; the state-of-the-art and measurement prospects for ITER
NASA Astrophysics Data System (ADS)
Doyle, E. J.
2006-10-01
Dramatic progress in millimeter-wave technology has allowed the realization of a key goal for ITER diagnostics, the routine measurement of the plasma density profile from millimeter-wave radar (reflectometry) measurements. In reflectometry, the measured round-trip group delay of a probe beam reflected from a plasma cutoff is used to infer the density distribution in the plasma. Reflectometer systems implemented by UCLA on a number of devices employ frequency-modulated continuous-wave (FM-CW), ultrawide-bandwidth, high-resolution radar systems. One such system on DIII-D has routinely demonstrated measurements of the density profile over a range of electron density of 0-6.4x10^19,m-3, with ˜25 μs time and ˜4 mm radial resolution, meeting key ITER requirements. This progress in performance was made possible by multiple advances in the areas of millimeter-wave technology, novel measurement techniques, and improved understanding, including: (i) fast sweep, solid-state, wide bandwidth sources and power amplifiers, (ii) dual polarization measurements to expand the density range, (iii) adaptive radar-based data analysis with parallel processing on a Unix cluster, (iv) high memory depth data acquisition, and (v) advances in full wave code modeling. The benefits of advanced system performance will be illustrated using measurements from a wide range of phenomena, including ELM and fast-ion driven mode dynamics, L-H transition studies and plasma-wall interaction. The measurement capabilities demonstrated by these systems provide a design basis for the development of the main ITER profile reflectometer system. This talk will explore the extent to which these reflectometer system designs, results and experience can be translated to ITER, and will identify what new studies and experimental tests are essential.
NASA Astrophysics Data System (ADS)
Mehmood, Arshad; Zheng, Yuliang; Braun, Hubertus; Hovhannisyan, Martun; Letz, Martin; Jakoby, Rolf
2015-01-01
This paper presents the application of new high permittivity and low loss glass material for antennas. This glass material is transparent. A very simple rectangular dielectric resonator antenna is designed first with a simple microstrip feeding line. In order to widen the bandwidth, the feed of the design is modified by forming a T-shaped feeding. This new design enhanced the bandwidth range to cover the WLAN 5 GHz band completely. The dielectric resonator antenna cut into precise dimensions is placed on the modified microstrip feed line. The design is simple and easy to manufacture and also very compact in size of only 36 × 28 mm. A -10 dB impedance bandwidth of 18% has been achieved, which covers the frequency range from 5.15 GHz to 5.95 GHz. Simulations of the measured return loss and radiation patterns are presented and discussed.
Receiver bandwidth effects on complex modulation and detection using directly modulated lasers.
Yuan, Feng; Che, Di; Shieh, William
2016-05-01
Directly modulated lasers (DMLs) have long been employed for short- and medium-reach optical communications due to their low cost. Recently, a new modulation scheme called complex modulated DMLs has been demonstrated showing a significant optical signal to noise ratio sensitivity enhancement compared with the traditional intensity-only detection scheme. However, chirp-induced optical spectrum broadening is inevitable in complex modulated systems, which may imply a need for high-bandwidth receivers. In this Letter, we study the impact of receiver bandwidth effects on the performance of complex modulation and coherent detection systems based on DMLs. We experimentally demonstrate that such systems exhibit a reasonable tolerance for the reduced receiver bandwidth. For 10 Gbaud 4-level pulse amplitude modulation signals, the required electrical bandwidth is as low as 8.5 and 7.5 GHz for 7% and 20% forward error correction, respectively. Therefore, it is feasible to realize DML-based complex modulated systems using cost-effective receivers with narrow bandwidth.
Wide field fluorescence epi-microscopy behind a scattering medium enabled by speckle correlations
NASA Astrophysics Data System (ADS)
Hofer, Matthias; Soeller, Christian; Brasselet, Sophie; Bertolotti, Jacopo
2018-04-01
Fluorescence microscopy is widely used in biological imaging, however scattering from tissues strongly limits its applicability to a shallow depth. In this work we adapt a methodology inspired from stellar speckle interferometry, and exploit the optical memory effect to enable fluorescence microscopy through a turbid layer. We demonstrate efficient reconstruction of micrometer-size fluorescent objects behind a scattering medium in epi-microscopy, and study the specificities of this imaging modality (magnification, field of view, resolution) as compared to traditional microscopy. Using a modified phase retrieval algorithm to reconstruct fluorescent objects from speckle images, we demonstrate robust reconstructions even in relatively low signal to noise conditions. This modality is particularly appropriate for imaging in biological media, which are known to exhibit relatively large optical memory ranges compatible with tens of micrometers size field of views, and large spectral bandwidths compatible with emission fluorescence spectra of tens of nanometers widths.
NASA Astrophysics Data System (ADS)
Tu, H.-Yu.; Tasneem, Sarah
Most of modern microprocessors employ on—chip cache memories to meet the memory bandwidth demand. These caches are now occupying a greater real es tate of chip area. Also, continuous down scaling of transistors increases the possi bility of defects in the cache area which already starts to occupies more than 50% of chip area. For this reason, various techniques have been proposed to tolerate defects in cache blocks. These techniques can be classified into three different cat egories, namely, cache line disabling, replacement with spare block, and decoder reconfiguration without spare blocks. This chapter examines each of those fault tol erant techniques with a fixed typical size and organization of L1 cache, through extended simulation using SPEC2000 benchmark on individual techniques. The de sign and characteristics of each technique are summarized with a view to evaluate the scheme. We then present our simulation results and comparative study of the three different methods.
SITRUS: Semantic Infrastructure for Wireless Sensor Networks
Bispo, Kalil A.; Rosa, Nelson S.; Cunha, Paulo R. F.
2015-01-01
Wireless sensor networks (WSNs) are made up of nodes with limited resources, such as processing, bandwidth, memory and, most importantly, energy. For this reason, it is essential that WSNs always work to reduce the power consumption as much as possible in order to maximize its lifetime. In this context, this paper presents SITRUS (semantic infrastructure for wireless sensor networks), which aims to reduce the power consumption of WSN nodes using ontologies. SITRUS consists of two major parts: a message-oriented middleware responsible for both an oriented message communication service and a reconfiguration service; and a semantic information processing module whose purpose is to generate a semantic database that provides the basis to decide whether a WSN node needs to be reconfigurated or not. In order to evaluate the proposed solution, we carried out an experimental evaluation to assess the power consumption and memory usage of WSN applications built atop SITRUS. PMID:26528974
Pushing the Limits of Broadband and High-Frequency Metamaterial Silicon Antireflection Coatings
NASA Astrophysics Data System (ADS)
Coughlin, K. P.; McMahon, J. J.; Crowley, K. T.; Koopman, B. J.; Miller, K. H.; Simon, S. M.; Wollack, E. J.
2018-05-01
Broadband refractive optics realized from high-index materials provide compelling design solutions for the next generation of observatories for the cosmic microwave background and for sub-millimeter astronomy. In this paper, work is presented which extends the state of the art in silicon lenses with metamaterial antireflection coatings toward larger-bandwidth and higher-frequency operation. Examples presented include octave bandwidth coatings with less than 0.5% reflection, a prototype 4:1 bandwidth coating, and a coating optimized for 1.4 THz. For these coatings, the detailed design, fabrication and testing processes are described as well as the inherent performance trade-offs.
Note: A high dynamic range, linear response transimpedance amplifier.
Eckel, S; Sushkov, A O; Lamoreaux, S K
2012-02-01
We have built a high dynamic range (nine decade) transimpedance amplifier with a linear response. The amplifier uses junction-gate field effect transistors (JFETs) to switch between three different resistors in the feedback of a low input bias current operational amplifier. This allows for the creation of multiple outputs, each with a linear response and a different transimpedance gain. The overall bandwidth of the transimpedance amplifier is set by the bandwidth of the most sensitive range. For our application, we demonstrate a three-stage amplifier with transimpedance gains of approximately 10(9)Ω, 3 × 10(7)Ω, and 10(4)Ω with a bandwidth of 100 Hz.
NASA Technical Reports Server (NTRS)
Laird, Jamie S.; Onoda, Shinobu; Hirao, Toshio; Becker, Heidi; Johnston, Allan; Laird, Jamie S.; Itoh, Hisayoshi
2006-01-01
Effects of displacement damage and ionization damage induced by gamma irradiation on the dark current and impulse response of a high-bandwidth low breakdown voltage Si Avalanche Photodiode has been investigated using picosecond laser microscopy. At doses as high as 10Mrad (Si) minimal alteration in the impulse response and bandwidth were observed. However, dark current measurements also performed with and without biased irradiation exhibit anomalously large damage factors for applied biases close to breakdown. The absence of any degradation in the impulse response is discussed as are possible mechanisms for higher dark current damage factors observed for biased irradiation.
Wavelength and bandwidth tunable photonic stopband of ferroelectric liquid crystals.
Ozaki, Ryotaro; Moritake, Hiroshi
2012-03-12
The chiral smectic C phase of ferroelectric liquid crystals (FLCs) has a self-assembling helical structure which is regarded as a one-dimensional pseudo-photonic crystal. It is well known that a stopband of a FLC can be tuned in wavelength domain by changing temperature or electric field. We here have demonstrated an FLC stopband with independently tunable wavelength and bandwidth by controlling temperature and incident angle. At highly oblique incidence, the stopband does not have polarization dependence. Furthermore, the bandwidth at highly oblique incidence is much wider than that at normal incidence. The mechanism of the tunable stopband is clarified by considering the reflection at oblique incidence.
Enabling MPEG-2 video playback in embedded systems through improved data cache efficiency
NASA Astrophysics Data System (ADS)
Soderquist, Peter; Leeser, Miriam E.
1999-01-01
Digital video decoding, enabled by the MPEG-2 Video standard, is an important future application for embedded systems, particularly PDAs and other information appliances. Many such system require portability and wireless communication capabilities, and thus face severe limitations in size and power consumption. This places a premium on integration and efficiency, and favors software solutions for video functionality over specialized hardware. The processors in most embedded system currently lack the computational power needed to perform video decoding, but a related and equally important problem is the required data bandwidth, and the need to cost-effectively insure adequate data supply. MPEG data sets are very large, and generate significant amounts of excess memory traffic for standard data caches, up to 100 times the amount required for decoding. Meanwhile, cost and power limitations restrict cache sizes in embedded systems. Some systems, including many media processors, eliminate caches in favor of memories under direct, painstaking software control in the manner of digital signal processors. Yet MPEG data has locality which caches can exploit if properly optimized, providing fast, flexible, and automatic data supply. We propose a set of enhancements which target the specific needs of the heterogeneous types within the MPEG decoder working set. These optimizations significantly improve the efficiency of small caches, reducing cache-memory traffic by almost 70 percent, and can make an enhanced 4 KB cache perform better than a standard 1 MB cache. This performance improvement can enable high-resolution, full frame rate video playback in cheaper, smaller system than woudl otherwise be possible.
Coarse-Grain Bandwidth Estimation Scheme for Large-Scale Network
NASA Technical Reports Server (NTRS)
Cheung, Kar-Ming; Jennings, Esther H.; Sergui, John S.
2013-01-01
A large-scale network that supports a large number of users can have an aggregate data rate of hundreds of Mbps at any time. High-fidelity simulation of a large-scale network might be too complicated and memory-intensive for typical commercial-off-the-shelf (COTS) tools. Unlike a large commercial wide-area-network (WAN) that shares diverse network resources among diverse users and has a complex topology that requires routing mechanism and flow control, the ground communication links of a space network operate under the assumption of a guaranteed dedicated bandwidth allocation between specific sparse endpoints in a star-like topology. This work solved the network design problem of estimating the bandwidths of a ground network architecture option that offer different service classes to meet the latency requirements of different user data types. In this work, a top-down analysis and simulation approach was created to size the bandwidths of a store-and-forward network for a given network topology, a mission traffic scenario, and a set of data types with different latency requirements. These techniques were used to estimate the WAN bandwidths of the ground links for different architecture options of the proposed Integrated Space Communication and Navigation (SCaN) Network. A new analytical approach, called the "leveling scheme," was developed to model the store-and-forward mechanism of the network data flow. The term "leveling" refers to the spreading of data across a longer time horizon without violating the corresponding latency requirement of the data type. Two versions of the leveling scheme were developed: 1. A straightforward version that simply spreads the data of each data type across the time horizon and doesn't take into account the interactions among data types within a pass, or between data types across overlapping passes at a network node, and is inherently sub-optimal. 2. Two-state Markov leveling scheme that takes into account the second order behavior of the store-and-forward mechanism, and the interactions among data types within a pass. The novelty of this approach lies in the modeling of the store-and-forward mechanism of each network node. The term store-and-forward refers to the data traffic regulation technique in which data is sent to an intermediate network node where they are temporarily stored and sent at a later time to the destination node or to another intermediate node. Store-and-forward can be applied to both space-based networks that have intermittent connectivity, and ground-based networks with deterministic connectivity. For groundbased networks, the store-and-forward mechanism is used to regulate the network data flow and link resource utilization such that the user data types can be delivered to their destination nodes without violating their respective latency requirements.
NASA Astrophysics Data System (ADS)
Zheng, Jun; Ansari, Nirwan
2005-03-01
Call for Papers: Optical Access Networks With the wide deployment of fiber-optic technology over the past two decades, we have witnessed a tremendous growth of bandwidth capacity in the backbone networks of today's telecommunications infrastructure. However, access networks, which cover the "last-mile" areas and serve numerous residential and small business users, have not been scaled up commensurately. The local subscriber lines for telephone and cable television are still using twisted pairs and coaxial cables. Most residential connections to the Internet are still through dial-up modems operating at a low speed on twisted pairs. As the demand for access bandwidth increases with emerging high-bandwidth applications, such as distance learning, high-definition television (HDTV), and video on demand (VoD), the last-mile access networks have become a bandwidth bottleneck in today's telecommunications infrastructure. To ease this bottleneck, it is imperative to provide sufficient bandwidth capacity in the access networks to open the bottleneck and thus present more opportunities for the provisioning of multiservices. Optical access solutions promise huge bandwidth to service providers and low-cost high-bandwidth services to end users and are therefore widely considered the technology of choice for next-generation access networks. To realize the vision of optical access networks, however, many key issues still need to be addressed, such as network architectures, signaling protocols, and implementation standards. The major challenges lie in the fact that an optical solution must be not only robust, scalable, and flexible, but also implemented at a low cost comparable to that of existing access solutions in order to increase the economic viability of many potential high-bandwidth applications. In recent years, optical access networks have been receiving tremendous attention from both academia and industry. A large number of research activities have been carried out or are now underway this hot area. The purpose of this feature issue is to expose the networking community to the latest research breakthroughs and progresses in the area of optical access networks. This feature issue aims to present a collection of papers that focus on the state-of-the-art research in various networking aspects of optical access networks. Original papers are solicited from all researchers involved in area of optical access networks. Topics of interest include but not limited to:
Video Compression Study: h.265 vs h.264
NASA Technical Reports Server (NTRS)
Pryor, Jonathan
2016-01-01
H.265 video compression (also known as High Efficiency Video Encoding (HEVC)) promises to provide double the video quality at half the bandwidth, or the same quality at half the bandwidth of h.264 video compression [1]. This study uses a Tektronix PQA500 to determine the video quality gains by using h.265 encoding. This study also compares two video encoders to see how different implementations of h.264 and h.265 impact video quality at various bandwidths.
NASA Astrophysics Data System (ADS)
Hill, C.
2008-12-01
Low cost graphic cards today use many, relatively simple, compute cores to deliver support for memory bandwidth of more than 100GB/s and theoretical floating point performance of more than 500 GFlop/s. Right now this performance is, however, only accessible to highly parallel algorithm implementations that, (i) can use a hundred or more, 32-bit floating point, concurrently executing cores, (ii) can work with graphics memory that resides on the graphics card side of the graphics bus and (iii) can be partially expressed in a language that can be compiled by a graphics programming tool. In this talk we describe our experiences implementing a complete, but relatively simple, time dependent shallow-water equations simulation targeting a cluster of 30 computers each hosting one graphics card. The implementation takes into account the considerations (i), (ii) and (iii) listed previously. We code our algorithm as a series of numerical kernels. Each kernel is designed to be executed by multiple threads of a single process. Kernels are passed memory blocks to compute over which can be persistent blocks of memory on a graphics card. Each kernel is individually implemented using the NVidia CUDA language but driven from a higher level supervisory code that is almost identical to a standard model driver. The supervisory code controls the overall simulation timestepping, but is written to minimize data transfer between main memory and graphics memory (a massive performance bottle-neck on current systems). Using the recipe outlined we can boost the performance of our cluster by nearly an order of magnitude, relative to the same algorithm executing only on the cluster CPU's. Achieving this performance boost requires that many threads are available to each graphics processor for execution within each numerical kernel and that the simulations working set of data can fit into the graphics card memory. As we describe, this puts interesting upper and lower bounds on the problem sizes for which this technology is currently most useful. However, many interesting problems fit within this envelope. Looking forward, we extrapolate our experience to estimate full-scale ocean model performance and applicability. Finally we describe preliminary hybrid mixed 32-bit and 64-bit experiments with graphics cards that support 64-bit arithmetic, albeit at a lower performance.
Time-Series Forecast Modeling on High-Bandwidth Network Measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Wucherl; Sim, Alex
With the increasing number of geographically distributed scientific collaborations and the growing sizes of scientific data, it has become challenging for users to achieve the best possible network performance on a shared network. In this paper, we have developed a model to forecast expected bandwidth utilization on high-bandwidth wide area networks. The forecast model can improve the efficiency of the resource utilization and scheduling of data movements on high-bandwidth networks to accommodate ever increasing data volume for large-scale scientific data applications. A univariate time-series forecast model is developed with the Seasonal decomposition of Time series by Loess (STL) and themore » AutoRegressive Integrated Moving Average (ARIMA) on Simple Network Management Protocol (SNMP) path utilization measurement data. Compared with the traditional approach such as Box-Jenkins methodology to train the ARIMA model, our forecast model reduces computation time up to 92.6 %. It also shows resilience against abrupt network usage changes. Finally, our forecast model conducts the large number of multi-step forecast, and the forecast errors are within the mean absolute deviation (MAD) of the monitored measurements.« less
Influence of Reduced Graphene Oxide on Effective Absorption Bandwidth Shift of Hybrid Absorbers.
Ameer, Shahid; Gul, Iftikhar Hussain
2016-01-01
The magnetic nanoparticle composite NiFe2O4 has traditionally been studied for high-frequency microwave absorption with marginal performance towards low-frequency radar bands (particularly L and S bands). Here, NiFe2O4 nanoparticles and nanohybrids using large-diameter graphene oxide (GO) sheets are prepared via solvothermal synthesis for low-frequency wide bandwidth shielding (L and S radar bands). The synthesized materials were characterized using XRD, SEM, FTIR and microwave magneto dielectric spectroscopy. The dimension of these solvothermally synthesized pristine particles and hybrids lies within 30-58 nm. Microwave magneto-dielectric spectroscopy was performed in the low-frequency region in the 1 MHz-3 GHz spectrum. The as-synthesized pristine nanoparticles and hybrids were found to be highly absorbing for microwaves throughout the L and S radar bands (< -10 dB from 1 MHz to 3 GHz). This excellent microwave absorbing property induced by graphene sheet coupling shows application of these materials with absorption bandwidth which is tailored such that these could be used for low frequency. Previously, these were used for high frequency absorptions (typically > 4 GHz) with limited selective bandwidth.
Electrically-driven GHz range ultrafast graphene light emitter (Conference Presentation)
NASA Astrophysics Data System (ADS)
Kim, Youngduck; Gao, Yuanda; Shiue, Ren-Jye; Wang, Lei; Aslan, Ozgur Burak; Kim, Hyungsik; Nemilentsau, Andrei M.; Low, Tony; Taniguchi, Takashi; Watanabe, Kenji; Bae, Myung-Ho; Heinz, Tony F.; Englund, Dirk R.; Hone, James
2017-02-01
Ultrafast electrically driven light emitter is a critical component in the development of the high bandwidth free-space and on-chip optical communications. Traditional semiconductor based light sources for integration to photonic platform have therefore been heavily studied over the past decades. However, there are still challenges such as absence of monolithic on-chip light sources with high bandwidth density, large-scale integration, low-cost, small foot print, and complementary metal-oxide-semiconductor (CMOS) technology compatibility. Here, we demonstrate the first electrically driven ultrafast graphene light emitter that operate up to 10 GHz bandwidth and broadband range (400 1600 nm), which are possible due to the strong coupling of charge carriers in graphene and surface optical phonons in hBN allow the ultrafast energy and heat transfer. In addition, incorporation of atomically thin hexagonal boron nitride (hBN) encapsulation layers enable the stable and practical high performance even under the ambient condition. Therefore, electrically driven ultrafast graphene light emitters paves the way towards the realization of ultrahigh bandwidth density photonic integrated circuits and efficient optical communications networks.
Time-Series Forecast Modeling on High-Bandwidth Network Measurements
Yoo, Wucherl; Sim, Alex
2016-06-24
With the increasing number of geographically distributed scientific collaborations and the growing sizes of scientific data, it has become challenging for users to achieve the best possible network performance on a shared network. In this paper, we have developed a model to forecast expected bandwidth utilization on high-bandwidth wide area networks. The forecast model can improve the efficiency of the resource utilization and scheduling of data movements on high-bandwidth networks to accommodate ever increasing data volume for large-scale scientific data applications. A univariate time-series forecast model is developed with the Seasonal decomposition of Time series by Loess (STL) and themore » AutoRegressive Integrated Moving Average (ARIMA) on Simple Network Management Protocol (SNMP) path utilization measurement data. Compared with the traditional approach such as Box-Jenkins methodology to train the ARIMA model, our forecast model reduces computation time up to 92.6 %. It also shows resilience against abrupt network usage changes. Finally, our forecast model conducts the large number of multi-step forecast, and the forecast errors are within the mean absolute deviation (MAD) of the monitored measurements.« less
Single-channel recordings of RyR1 at microsecond resolution in CMOS-suspended membranes.
Hartel, Andreas J W; Ong, Peijie; Schroeder, Indra; Giese, M Hunter; Shekar, Siddharth; Clarke, Oliver B; Zalk, Ran; Marks, Andrew R; Hendrickson, Wayne A; Shepard, Kenneth L
2018-02-20
Single-channel recordings are widely used to explore functional properties of ion channels. Typically, such recordings are performed at bandwidths of less than 10 kHz because of signal-to-noise considerations, limiting the temporal resolution available for studying fast gating dynamics to greater than 100 µs. Here we present experimental methods that directly integrate suspended lipid bilayers with high-bandwidth, low-noise transimpedance amplifiers based on complementary metal-oxide-semiconductor (CMOS) integrated circuits (IC) technology to achieve bandwidths in excess of 500 kHz and microsecond temporal resolution. We use this CMOS-integrated bilayer system to study the type 1 ryanodine receptor (RyR1), a Ca 2+ -activated intracellular Ca 2+ -release channel located on the sarcoplasmic reticulum. We are able to distinguish multiple closed states not evident with lower bandwidth recordings, suggesting the presence of an additional Ca 2+ binding site, distinct from the site responsible for activation. An extended beta distribution analysis of our high-bandwidth data can be used to infer closed state flicker events as fast as 35 ns. These events are in the range of single-file ion translocations.
Gaussian entanglement distribution with gigahertz bandwidth.
Ast, Stefan; Ast, Melanie; Mehmet, Moritz; Schnabel, Roman
2016-11-01
The distribution of entanglement with Gaussian statistic can be used to generate a mathematically proven secure key for quantum cryptography. The distributed secret key rate is limited by the entanglement strength, the entanglement bandwidth, and the bandwidth of the photoelectric detectors. The development of a source for strongly bipartite entangled light with high bandwidth promises an increased measurement speed and a linear boost in the secure data rate. Here, we present the experimental realization of a Gaussian entanglement source with a bandwidth of more than 1.25 GHz. The entanglement spectrum was measured with balanced homodyne detectors and was quantified via the inseparability criterion introduced by Duan and coworkers with a critical value of 4 below which entanglement is certified. Our measurements yielded an inseparability value of about 1.8 at a frequency of 300 MHz to about 2.8 at 1.2 GHz, extending further to about 3.1 at 1.48 GHz. In the experiment we used two 2.6 mm long monolithic periodically poled potassium titanyl phosphate (KTP) resonators to generate two squeezed fields at the telecommunication wavelength of 1550 nm. Our result proves the possibility of generating and detecting strong continuous-variable entanglement with high speed.
A GPU-Based Wide-Band Radio Spectrometer
NASA Astrophysics Data System (ADS)
Chennamangalam, Jayanth; Scott, Simon; Jones, Glenn; Chen, Hong; Ford, John; Kepley, Amanda; Lorimer, D. R.; Nie, Jun; Prestage, Richard; Roshi, D. Anish; Wagner, Mark; Werthimer, Dan
2014-12-01
The graphics processing unit has become an integral part of astronomical instrumentation, enabling high-performance online data reduction and accelerated online signal processing. In this paper, we describe a wide-band reconfigurable spectrometer built using an off-the-shelf graphics processing unit card. This spectrometer, when configured as a polyphase filter bank, supports a dual-polarisation bandwidth of up to 1.1 GHz (or a single-polarisation bandwidth of up to 2.2 GHz) on the latest generation of graphics processing units. On the other hand, when configured as a direct fast Fourier transform, the spectrometer supports a dual-polarisation bandwidth of up to 1.4 GHz (or a single-polarisation bandwidth of up to 2.8 GHz).
The retention and disruption of color information in human short-term visual memory.
Nemes, Vanda A; Parry, Neil R A; Whitaker, David; McKeefry, Declan J
2012-01-27
Previous studies have demonstrated that the retention of information in short-term visual perceptual memory can be disrupted by the presentation of masking stimuli during interstimulus intervals (ISIs) in delayed discrimination tasks (S. Magnussen & W. W. Greenlee, 1999). We have exploited this effect in order to determine to what extent short-term perceptual memory is selective for stimulus color. We employed a delayed hue discrimination paradigm to measure the fidelity with which color information was retained in short-term memory. The task required 5 color normal observers to discriminate between spatially non-overlapping colored reference and test stimuli that were temporally separated by an ISI of 5 s. The points of subjective equality (PSEs) on the resultant psychometric matching functions provided an index of performance. Measurements were made in the presence and absence of mask stimuli presented during the ISI, which varied in hue around the equiluminant plane in DKL color space. For all reference stimuli, we found a consistent mask-induced, hue-dependent shift in PSE compared to the "no mask" conditions. These shifts were found to be tuned in color space, only occurring for a range of mask hues that fell within bandwidths of 29-37 deg. Outside this range, masking stimuli had little or no effect on measured PSEs. The results demonstrate that memory masking for color exhibits selectivity similar to that which has already been demonstrated for other visual attributes. The relatively narrow tuning of these interference effects suggests that short-term perceptual memory for color is based on higher order, non-linear color coding. © ARVO
Mechanism of bandwidth improvement in passively cooled SMA position actuators
NASA Astrophysics Data System (ADS)
Gorbet, R. B.; Morris, K. A.; Chau, R. C. C.
2009-09-01
The heating of shape memory alloy (SMA) materials leads to a thermally driven phase change which can be used to do work. An SMA wire can be thermally cycled by controlling electric current through the wire, creating an electro-mechanical actuator. Such actuators are typically heated electrically and cooled through convection. The thermal time constants and lack of active cooling limit the operating frequencies. In this work, the bandwidth of a still-air-cooled SMA wire controlled with a PID controller is improved through optimization of the controller gains. Results confirm that optimization can improve the ability of the actuator to operate at a given frequency. Overshoot is observed in the optimal controllers at low frequencies. This is a result of hysteresis in the wire's contraction-temperature characteristic, since different input temperatures can achieve the same output value. The optimal controllers generate overshoot during heating, in order to cause the system to operate at a point on the hysteresis curve where faster cooling can be achieved. The optimization results in a controller which effectively takes advantage of the multi-valued nature of the hysteresis to improve performance.
Spectral Analysis Tool 6.2 for Windows
NASA Technical Reports Server (NTRS)
Morgan, Feiming; Sue, Miles; Peng, Ted; Tan, Harry; Liang, Robert; Kinman, Peter
2006-01-01
Spectral Analysis Tool 6.2 is the latest version of a computer program that assists in analysis of interference between radio signals of the types most commonly used in Earth/spacecraft radio communications. [An earlier version was reported in Software for Analyzing Earth/Spacecraft Radio Interference (NPO-20422), NASA Tech Briefs, Vol. 25, No. 4 (April 2001), page 52.] SAT 6.2 calculates signal spectra, bandwidths, and interference effects for several families of modulation schemes. Several types of filters can be modeled, and the program calculates and displays signal spectra after filtering by any of the modeled filters. The program accommodates two simultaneous signals: a desired signal and an interferer. The interference-to-signal power ratio can be calculated for the filtered desired and interfering signals. Bandwidth-occupancy and link-budget calculators are included for the user s convenience. SAT 6.2 has a new software structure and provides a new user interface that is both intuitive and convenient. SAT 6.2 incorporates multi-tasking, multi-threaded execution, virtual memory management, and a dynamic link library. SAT 6.2 is designed for use on 32- bit computers employing Microsoft Windows operating systems.
Highly efficient frequency conversion with bandwidth compression of quantum light
Allgaier, Markus; Ansari, Vahid; Sansoni, Linda; Eigner, Christof; Quiring, Viktor; Ricken, Raimund; Harder, Georg; Brecht, Benjamin; Silberhorn, Christine
2017-01-01
Hybrid quantum networks rely on efficient interfacing of dissimilar quantum nodes, as elements based on parametric downconversion sources, quantum dots, colour centres or atoms are fundamentally different in their frequencies and bandwidths. Although pulse manipulation has been demonstrated in very different systems, to date no interface exists that provides both an efficient bandwidth compression and a substantial frequency translation at the same time. Here we demonstrate an engineered sum-frequency-conversion process in lithium niobate that achieves both goals. We convert pure photons at telecom wavelengths to the visible range while compressing the bandwidth by a factor of 7.47 under preservation of non-classical photon-number statistics. We achieve internal conversion efficiencies of 61.5%, significantly outperforming spectral filtering for bandwidth compression. Our system thus makes the connection between previously incompatible quantum systems as a step towards usable quantum networks. PMID:28134242
Optoelectronics research for communication programs at the Goddard Space Flight Center
NASA Technical Reports Server (NTRS)
Krainak, Michael A.
1991-01-01
Current optoelectronics research and development of high-power, high-bandwidth laser transmitters, high-bandwidth, high-sensitivity optical receivers, pointing, acquisition and tracking components, and experimental and theoretical system modeling at the NASA Goddard Space Flight Center is reviewed. Program hardware and space flight milestones are presented. It is believed that these experiments will pave the way for intersatellite optical communications links for both the NASA Advanced Tracking and Data Relay Satellite System and commercial users in the 21st century.
NASA Technical Reports Server (NTRS)
2008-01-01
Topics covered include: Gas Sensors Based on Coated and Doped Carbon Nanotubes; Tactile Robotic Topographical Mapping Without Force or Contact Sensors; Thin-Film Magnetic-Field-Response Fluid-Level Sensor for Non-Viscous Fluids; Progress in Development of Improved Ion-Channel Biosensors; Simulating Operation of a Complex Sensor Network; Using Transponders on the Moon to Increase Accuracy of GPS; Controller for Driving a Piezoelectric Actuator at Resonance; Coaxial Electric Heaters; Dual-Input AND Gate From Single-Channel Thin-Film FET; High-Density, High-Bandwidth, Multilevel Holographic Memory; Fabrication of Gate-Electrode Integrated Carbon-Nanotube Bundle Field Emitters; Hydroxide-Assisted Bonding of Ultra-Low-Expansion Glass; Photochemically Synthesized Polyimides; Optimized Carbonate and Ester-Based Li-Ion Electrolytes; Compact 6-DOF Stage for Optical Adjustments; Ultrasonic/Sonic Impacting Penetrators; Miniature, Lightweight, One-Time-Opening Valve; Supplier Management System; Improved CLARAty Functional-Layer/Decision-Layer Interface; JAVA Stereo Display Toolkit; Remote-Sensing Time Series Analysis, a Vegetation Monitoring Tool; PyPele Rewritten To Use MPI; Data Assimilation Cycling for Weather Analysis; Hydrocyclone/Filter for Concentrating Biomarkers from Soil; Activating STAT3 Alpha for Promoting Healing of Neurons; and Probing a Spray Using Frequency-Analyzed Light Scattering.
New laser glass for short pulsed laser applications: the BLG80 (Conference Presentation)
NASA Astrophysics Data System (ADS)
George, Simi A.
2017-03-01
For achieving highest peak powers in a solid state laser (SSL) system, significant energy output and short pulses are necessary. For mode-locked lasers, it is well-known from the Fourier theorem that the largest gain bandwidths produce the narrowest pulse-widths; thus are transform limited. For an inhomogeneously broadened line width of a laser medium, if the intensity of pulses follow a Gaussian function, then the resulting mode-locked pulse will have a Gaussian shape with the emission bandwidth/pulse duration relationship of pulse ≥ 0.44?02/c. Thus, for high peak power SSL systems, laser designers incorporate gain materials capable of broad emission bandwidths. Available energy outputs from a phosphate glass host doped with rare-earth ions are unparalleled. Unfortunately, the emission bandwidths achievable from glass based gain materials are typically many factors smaller when compared to the Ti:Sapphire crystal. In order to overcome this limitation, a hybrid "mixed" laser glass amplifier - OPCPA approach was developed. The Texas petawatt laser that is currently in operation at the University of Texas-Austin and producing high peak powers uses this hybrid architecture. In this mixed-glass laser design, a phosphate and a silicate glass is used in series to achieve a broader bandwidth required before compression. Though proven, this technology is still insufficient for the future compact petawatt and exawatt systems capable of producing high energies and shorter pulse durations. New glasses with bandwidths that are two and three times larger than what is now available from glass hosts is needed if there is to be an alternative to Ti:Sapphire for laser designers. In this paper, we present new materials that may meet the necessary characteristics and demonstrate the laser and emission characteristics these through the internal and external studies.
Toward a Mobility-Driven Architecture for Multimodal Underwater Networking
2017-02-01
applications. By equipping AUVs with short-range, high -bandwidth underwater wireless communications , which feature lower energy-per-bit cost than acoustic...protocols. They suffer from significant transmission path losses at high frequencies , long propagation delays, low and distance-dependent bandwidth, time...of data preprocessing, data compression, and either tethering to a surface buoy able to use radio frequency (RF) communications or using undersea
Developing Reliable Telemedicine Platforms with Unreliable and Limited Communication Bandwidth
2017-10-01
hospital health care, the benefit of high -resolution medical data is greatly limited in battlefield or natural disaster areas, where communication to...sampling rate. For high - frequency data like waveforms, the downsampling approach could directly reduce the amount of data. Therefore, it could be used...AFRL-SA-WP-TR-2017-0019 Developing Reliable Telemedicine Platforms with Unreliable and Limited Communication Bandwidth Peter F
High Bandwidth Rotary Fast Tool Servos and a Hybrid Rotary/Linear Electromagnetic Actuator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Montesanti, Richard Clement
2005-09-01
This thesis describes the development of two high bandwidth short-stroke rotary fast tool servos and the hybrid rotary/linear electromagnetic actuator developed for one of them. Design insights, trade-o® methodologies, and analytical tools are developed for precision mechanical systems, power and signal electronic systems, control systems, normal-stress electromagnetic actuators, and the dynamics of the combined systems.
TrustGuard: A Containment Architecture with Verified Output
2017-01-01
that the TrustGuard system has minimal performance decline, despite restrictions such as high communication latency and limited available bandwidth...design are the availability of high bandwidth and low delays between the host and the monitoring chip. 3-D integration provides an alternate way of...TRUSTGUARD: A CONTAINMENT ARCHITECTURE WITH VERIFIED OUTPUT SOUMYADEEP GHOSH A DISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN
Strategic Implications of Cloud Computing for Modeling and Simulation (Briefing)
2016-04-01
of Promises with Cloud • Cost efficiency • Unlimited storage • Backup and recovery • Automatic software integration • Easy access to information...activities that wrap the actual exercise itself (e.g., travel for exercise support, data collection, integration , etc.). Cloud -based simulation would...requiring quick delivery rather than fewer large messages requiring high bandwidth. Cloud environments tend to be better at providing high-bandwidth
NASA Astrophysics Data System (ADS)
Geng, Yong; Huang, Xiatao; Cui, Wenwen; Ling, Yun; Xu, Bo; Zhang, Jin; Yi, Xingwen; Wu, Baojian; Huang, Shu-Wei; Qiu, Kun; Wong, Chee Wei; Zhou, Heng
2018-05-01
We demonstrate seamless channel multiplexing and high bitrate superchannel transmission of coherent optical orthogonal-frequency-division-multiplexing (CO-OFDM) data signals utilizing a dissipative Kerr soliton (DKS) frequency comb generated in an on-chip microcavity. Aided by comb line multiplication through Nyquist pulse modulation, the high stability and mutual coherence among mode-locked Kerr comb lines are exploited for the first time to eliminate the guard intervals between communication channels and achieve full spectral density bandwidth utilization. Spectral efficiency as high as 2.625 bit/Hz/s is obtained for 180 CO-OFDM bands encoded with 12.75 Gbaud 8-QAM data, adding up to total bitrate of 6.885 Tb/s within 2.295 THz frequency comb bandwidth. Our study confirms that high coherence is the key superiority of Kerr soliton frequency combs over independent laser diodes, as a multi-spectral coherent laser source for high-bandwidth high-spectral-density transmission networks.
Fault-tolerant bandwidth reservation strategies for data transfers in high-performance networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Liudong; Zhu, Michelle M.; Wu, Chase Q.
2016-11-22
Many next-generation e-science applications need fast and reliable transfer of large volumes of data with guaranteed performance, which is typically enabled by the bandwidth reservation service in high-performance networks. One prominent issue in such network environments with large footprints is that node and link failures are inevitable, hence potentially degrading the quality of data transfer. We consider two generic types of bandwidth reservation requests (BRRs) concerning data transfer reliability: (i) to achieve the highest data transfer reliability under a given data transfer deadline, and (ii) to achieve the earliest data transfer completion time while satisfying a given data transfer reliabilitymore » requirement. We propose two periodic bandwidth reservation algorithms with rigorous optimality proofs to optimize the scheduling of individual BRRs within BRR batches. The efficacy of the proposed algorithms is illustrated through extensive simulations in comparison with scheduling algorithms widely adopted in production networks in terms of various performance metrics.« less
Bandwidth management for mobile mode of mobile monitoring system for Indonesian Volcano
NASA Astrophysics Data System (ADS)
Evita, Maria; Djamal, Mitra; Zimanowski, Bernd; Schilling, Klaus
2017-01-01
Volcano monitoring requires the system which has high-fidelity operation and real-time acquisition. MONICA (Mobile Monitoring System for Indonesian Volcano), a system based on Wireless Sensor Network, mobile robot and satellite technology has been proposed to fulfill this requirement for volcano monitoring system in Indonesia. This system consists of fixed-mode for normal condition and mobile mode for emergency situation. The first and second modes have been simulated in slow motion earthquake cases of Merapi Volcano, Indonesia. In this research, we have investigated the application of our bandwidth management for high-fidelity operation and real time acquisition in mobile mode of a strong motion earthquake from this volcano. The simulation result showed that our system still could manage the bandwidth even when there were 2 died fixed node after had stroked by the lightning. This result (64% to 83% throughput in average) was still better than the bandwidth utilized by the existing equipment (0% throughput because of the broken seismometer).
Intelligent bandwith compression
NASA Astrophysics Data System (ADS)
Tseng, D. Y.; Bullock, B. L.; Olin, K. E.; Kandt, R. K.; Olsen, J. D.
1980-02-01
The feasibility of a 1000:1 bandwidth compression ratio for image transmission has been demonstrated using image-analysis algorithms and a rule-based controller. Such a high compression ratio was achieved by first analyzing scene content using auto-cueing and feature-extraction algorithms, and then transmitting only the pertinent information consistent with mission requirements. A rule-based controller directs the flow of analysis and performs priority allocations on the extracted scene content. The reconstructed bandwidth-compressed image consists of an edge map of the scene background, with primary and secondary target windows embedded in the edge map. The bandwidth-compressed images are updated at a basic rate of 1 frame per second, with the high-priority target window updated at 7.5 frames per second. The scene-analysis algorithms used in this system together with the adaptive priority controller are described. Results of simulated 1000:1 band width-compressed images are presented. A video tape simulation of the Intelligent Bandwidth Compression system has been produced using a sequence of video input from the data base.
Yi, Tianzhu; He, Zhihua; He, Feng; Dong, Zhen; Wu, Manqing
2017-01-01
This paper presents an efficient and precise imaging algorithm for the large bandwidth sliding spotlight synthetic aperture radar (SAR). The existing sub-aperture processing method based on the baseband azimuth scaling (BAS) algorithm cannot cope with the high order phase coupling along the range and azimuth dimensions. This coupling problem causes defocusing along the range and azimuth dimensions. This paper proposes a generalized chirp scaling (GCS)-BAS processing algorithm, which is based on the GCS algorithm. It successfully mitigates the deep focus along the range dimension of a sub-aperture of the large bandwidth sliding spotlight SAR, as well as high order phase coupling along the range and azimuth dimensions. Additionally, the azimuth focusing can be achieved by this azimuth scaling method. Simulation results demonstrate the ability of the GCS-BAS algorithm to process the large bandwidth sliding spotlight SAR data. It is proven that great improvements of the focus depth and imaging accuracy are obtained via the GCS-BAS algorithm. PMID:28555057
Tele-Assessment of the Berg Balance Scale: Effects of Transmission Characteristics.
Venkataraman, Kavita; Morgan, Michelle; Amis, Kristopher A; Landerman, Lawrence R; Koh, Gerald C; Caves, Kevin; Hoenig, Helen
2017-04-01
To compare Berg Balance Scale (BBS) rating using videos with differing transmission characteristics with direct in-person rating. Repeated-measures study for the assessment of the BBS in 8 configurations: in person, high-definition video with slow motion review, standard-definition videos with varying bandwidths and frame rates (768 kilobytes per second [kbps] videos at 8, 15, and 30 frames per second [fps], 30 fps videos at 128, 384, and 768 kbps). Medical center. Patients with limitations (N=45) in ≥1 of 3 specific aspects of motor function: fine motor coordination, gross motor coordination, and gait and balance. Not applicable. Ability to rate the BBS in person and using videos with differing bandwidths and frame rates in frontal and lateral views. Compared with in-person rating (7%), 18% (P=.29) of high-definition videos and 37% (P=.03) of standard-definition videos could not be rated. Interrater reliability for the high-definition videos was .96 (95% confidence interval, .94-.97). Rating failure proportions increased from 20% in videos with the highest bandwidth to 60% (P<.001) in videos with the lowest bandwidth, with no significant differences in proportions across frame rate categories. Both frontal and lateral views were critical for successful rating using videos, with 60% to 70% (P<.001) of videos unable to be rated on a single view. Although there is some loss of information when using videos to rate the BBS compared to in-person ratings, it is feasible to reliably rate the BBS remotely in standard clinical spaces. However, optimal video rating requires frontal and lateral views for each assessment, high-definition video with high bandwidth, and the ability to carry out slow motion review. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Spectral structure of laser light scattering revisited: bandwidths of nonresonant scattering lidars.
She, C Y
2001-09-20
It is well known that scattering lidars, i.e., Mie, aerosol-wind, Rayleigh, high-spectral-resolution, molecular-wind, rotational Raman, and vibrational Raman lidars, are workhorses for probing atmospheric properties, including the backscatter ratio, aerosol extinction coefficient, temperature, pressure, density, and winds. The spectral structure of molecular scattering (strength and bandwidth) and its constituent spectra associated with Rayleigh and vibrational Raman scattering are reviewed. Revisiting the correct name by distinguishing Cabannes scattering from Rayleigh scattering, and sharpening the definition of each scattering component in the Rayleigh scattering spectrum, the review allows a systematic, logical, and useful comparison in strength and bandwidth between each scattering component and in receiver bandwidths (for both nighttime and daytime operation) between the various scattering lidars for atmospheric sensing.
Guldiken, Rasim O.; Zahorian, Jaime; Yamaner, F. Y.; Degertekin, F. L.
2010-01-01
In this paper, we report measurement results on dual-electrode CMUT demonstrating electromechanical coupling coefficient (k2) of 0.82 at 90% of collapse voltage as well as 136% 3 dB one-way fractional bandwidth at the transducer surface around the design frequency of 8 MHz. These results are within 5% of the predictions of the finite element simulations. The large bandwidth is achieved mainly by utilizing a non-uniform membrane, introducing center mass to the design, whereas the dual-electrode structure provides high coupling coefficient in a large dc bias range without collapsing the membrane. In addition, the non-uniform membrane structure improves the transmit sensitivity of the dual-electrode CMUT by about 2dB as compared with a dual electrode CMUT with uniform membrane. PMID:19574135
NASA Astrophysics Data System (ADS)
Zheng, Jun; Ansari, Nirwan
2005-06-01
Call for Papers: Optical Access Networks With the wide deployment of fiber-optic technology over the past two decades, we have witnessed a tremendous growth of bandwidth capacity in the backbone networks of today's telecommunications infrastructure. However, access networks, which cover the "last-mile" areas and serve numerous residential and small business users, have not been scaled up commensurately. The local subscriber lines for telephone and cable television are still using twisted pairs and coaxial cables. Most residential connections to the Internet are still through dial-up modems operating at a low speed on twisted pairs. As the demand for access bandwidth increases with emerging high-bandwidth applications, such as distance learning, high-definition television (HDTV), and video on demand (VoD), the last-mile access networks have become a bandwidth bottleneck in today's telecommunications infrastructure. To ease this bottleneck, it is imperative to provide sufficient bandwidth capacity in the access networks to open the bottleneck and thus present more opportunities for the provisioning of multiservices. Optical access solutions promise huge bandwidth to service providers and low-cost high-bandwidth services to end users and are therefore widely considered the technology of choice for next-generation access networks. To realize the vision of optical access networks, however, many key issues still need to be addressed, such as network architectures, signaling protocols, and implementation standards. The major challenges lie in the fact that an optical solution must be not only robust, scalable, and flexible, but also implemented at a low cost comparable to that of existing access solutions in order to increase the economic viability of many potential high-bandwidth applications. In recent years, optical access networks have been receiving tremendous attention from both academia and industry. A large number of research activities have been carried out or are now underway this hot area. The purpose of this feature issue is to expose the networking community to the latest research breakthroughs and progresses in the area of optical access networks. This feature issue aims to present a collection of papers that focus on the state-of-the-art research in various networking aspects of optical access networks. Original papers are solicited from all researchers involved in area of optical access networks. Topics of interest include but not limited to: Optical access network architectures and protocols Passive optical networks (BPON, EPON, GPON, etc.) Active optical networks Multiple access control Multiservices and QoS provisioning Network survivability Field trials and standards Performance modeling and analysis
NASA Astrophysics Data System (ADS)
Zheng, Jun; Ansari, Nirwan; Jersey Inst Ansari, New; Jersey Inst, New
2005-04-01
Call for Papers: Optical Access Networks With the wide deployment of fiber-optic technology over the past two decades, we have witnessed a tremendous growth of bandwidth capacity in the backbone networks of today's telecommunications infrastructure. However, access networks, which cover the "last-mile" areas and serve numerous residential and small business users, have not been scaled up commensurately. The local subscriber lines for telephone and cable television are still using twisted pairs and coaxial cables. Most residential connections to the Internet are still through dial-up modems operating at a low speed on twisted pairs. As the demand for access bandwidth increases with emerging high-bandwidth applications, such as distance learning, high-definition television (HDTV), and video on demand (VoD), the last-mile access networks have become a bandwidth bottleneck in today's telecommunications infrastructure. To ease this bottleneck, it is imperative to provide sufficient bandwidth capacity in the access networks to open the bottleneck and thus present more opportunities for the provisioning of multiservices. Optical access solutions promise huge bandwidth to service providers and low-cost high-bandwidth services to end users and are therefore widely considered the technology of choice for next-generation access networks. To realize the vision of optical access networks, however, many key issues still need to be addressed, such as network architectures, signaling protocols, and implementation standards. The major challenges lie in the fact that an optical solution must be not only robust, scalable, and flexible, but also implemented at a low cost comparable to that of existing access solutions in order to increase the economic viability of many potential high-bandwidth applications. In recent years, optical access networks have been receiving tremendous attention from both academia and industry. A large number of research activities have been carried out or are now underway this hot area. The purpose of this feature issue is to expose the networking community to the latest research breakthroughs and progresses in the area of optical access networks. This feature issue aims to present a collection of papers that focus on the state-of-the-art research in various networking aspects of optical access networks. Original papers are solicited from all researchers involved in area of optical access networks. Topics of interest include but not limited to: Optical access network architectures and protocols Passive optical networks (BPON, EPON, GPON, etc.) Active optical networks Multiple access control Multiservices and QoS provisioning Network survivability Field trials and standards Performance modeling and analysis
NASA Astrophysics Data System (ADS)
Zheng, Jun; Ansari, Nirwan
2005-05-01
Call for Papers: Optical Access Networks With the wide deployment of fiber-optic technology over the past two decades, we have witnessed a tremendous growth of bandwidth capacity in the backbone networks of today's telecommunications infrastructure. However, access networks, which cover the "last-mile" areas and serve numerous residential and small business users, have not been scaled up commensurately. The local subscriber lines for telephone and cable television are still using twisted pairs and coaxial cables. Most residential connections to the Internet are still through dial-up modems operating at a low speed on twisted pairs. As the demand for access bandwidth increases with emerging high-bandwidth applications, such as distance learning, high-definition television (HDTV), and video on demand (VoD), the last-mile access networks have become a bandwidth bottleneck in today's telecommunications infrastructure. To ease this bottleneck, it is imperative to provide sufficient bandwidth capacity in the access networks to open the bottleneck and thus present more opportunities for the provisioning of multiservices. Optical access solutions promise huge bandwidth to service providers and low-cost high-bandwidth services to end users and are therefore widely considered the technology of choice for next-generation access networks. To realize the vision of optical access networks, however, many key issues still need to be addressed, such as network architectures, signaling protocols, and implementation standards. The major challenges lie in the fact that an optical solution must be not only robust, scalable, and flexible, but also implemented at a low cost comparable to that of existing access solutions in order to increase the economic viability of many potential high-bandwidth applications. In recent years, optical access networks have been receiving tremendous attention from both academia and industry. A large number of research activities have been carried out or are now underway this hot area. The purpose of this feature issue is to expose the networking community to the latest research breakthroughs and progresses in the area of optical access networks. This feature issue aims to present a collection of papers that focus on the state-of-the-art research in various networking aspects of optical access networks. Original papers are solicited from all researchers involved in area of optical access networks. Topics of interest include but not limited to: Optical access network architectures and protocols Passive optical networks (BPON, EPON, GPON, etc.) Active optical networks Multiple access control Multiservices and QoS provisioning Network survivability Field trials and standards Performance modeling and analysis
Efficient development of memory bounded geo-applications to scale on modern supercomputers
NASA Astrophysics Data System (ADS)
Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric
2016-04-01
Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases, near peak memory bandwidth transfer is achieved. Our approach allows us to get the best out of the current hardware.
NASA Technical Reports Server (NTRS)
Wyss, R. A.; Karasik, B. S.; McGrath, W. R.; Bumble, B.; LeDuc, H.
1999-01-01
Diffusion-cooled Nb hot-electron bolometer (HEB) mixers have the potential to simultaneously achieve high intermediate frequency (IF) bandwidths and low mixer noise temperatures for operation at THz frequencies (above the superconductive gap energy). We have measured the IF signal bandwidth at 630 GHz of Nb devices with lengths L = 0.3, 0.2, and 0.1 micrometer in a quasioptical mixer configuration employing twin-slot antennas. The 3-dB EF bandwidth increased from 1.2 GHz for the 0.3 gm long device to 9.2 GHz for the 0.1 gm long device. These results demonstrate the expected 1/L squared dependence of the IF bandwidth at submillimeter wave frequencies for the first time, as well as the largest EF bandwidth obtained to date. For the 0.1 gm device, which had the largest bandwidth, the double sideband (DSB) noise temperature of the receiver was 320-470 K at 630 GHz with an absorbed LO power of 35 nW, estimated using the isothermal method. A version of this mixer with the antenna length scaled for operation at 2.5 THz has also been tested. A DSB receiver noise temperature of 1800 plus or minus 100 K was achieved, which is about 1,000 K lower than our previously reported results. These results demonstrate that large EF bandwidth and low-noise operation of a diffusion-cooled HEB mixer is possible at THz frequencies with the same device geometry.
High-Bandwidth Dynamic Full-Field Profilometry for Nano-Scale Characterization of MEMS
NASA Astrophysics Data System (ADS)
Chen, Liang-Chia; Huang, Yao-Ting; Chang, Pi-Bai
2006-10-01
The article describes an innovative optical interferometric methodology to delivery dynamic surface profilometry with a measurement bandwidth up to 10MHz or higher and a vertical resolution up to 1 nm. Previous work using stroboscopic microscopic interferometry for dynamic characterization of micro (opto)electromechanical systems (M(O)EMS) has been limited in measurement bandwidth mainly within a couple of MHz. For high resonant mode analysis, the stroboscopic light pulse is insufficiently short to capture the moving fringes from dynamic motion of the detected structure. In view of this need, a microscopic prototype based on white-light stroboscopic interferometry with an innovative light superposition strategy was developed to achieve dynamic full-field profilometry with a high measurement bandwidth up to 10MHz or higher. The system primarily consists of an optical microscope, on which a Mirau interferometric objective embedded with a piezoelectric vertical translator, a high-power LED light module with dual operation modes and light synchronizing electronics unit are integrated. A micro cantilever beam used in AFM was measured to verify the system capability in accurate characterisation of dynamic behaviours of the device. The full-field seventh-mode vibration at a vibratory frequency of 3.7MHz can be fully characterized and nano-scale vertical measurement resolution as well as tens micrometers of vertical measurement range can be performed.
A high-speed network for cardiac image review.
Elion, J L; Petrocelli, R R
1994-01-01
A high-speed fiber-based network for the transmission and display of digitized full-motion cardiac images has been developed. Based on Asynchronous Transfer Mode (ATM), the network is scaleable, meaning that the same software and hardware is used for a small local area network or for a large multi-institutional network. The system can handle uncompressed digital angiographic images, considered to be at the "high-end" of the bandwidth requirements. Along with the networking, a general-purpose multi-modality review station has been implemented without specialized hardware. This station can store a full injection sequence in "loop RAM" in a 512 x 512 format, then interpolate to 1024 x 1024 while displaying at 30 frames per second. The network and review stations connect to a central file server that uses a virtual file system to make a large high-speed RAID storage disk and associated off-line storage tapes and cartridges all appear as a single large file system to the software. In addition to supporting archival storage and review, the system can also digitize live video using high-speed Direct Memory Access (DMA) from the frame grabber to present uncompressed data to the network. Fully functional prototypes have provided the proof of concept, with full deployment in the institution planned as the next stage.
A high-speed network for cardiac image review.
Elion, J. L.; Petrocelli, R. R.
1994-01-01
A high-speed fiber-based network for the transmission and display of digitized full-motion cardiac images has been developed. Based on Asynchronous Transfer Mode (ATM), the network is scaleable, meaning that the same software and hardware is used for a small local area network or for a large multi-institutional network. The system can handle uncompressed digital angiographic images, considered to be at the "high-end" of the bandwidth requirements. Along with the networking, a general-purpose multi-modality review station has been implemented without specialized hardware. This station can store a full injection sequence in "loop RAM" in a 512 x 512 format, then interpolate to 1024 x 1024 while displaying at 30 frames per second. The network and review stations connect to a central file server that uses a virtual file system to make a large high-speed RAID storage disk and associated off-line storage tapes and cartridges all appear as a single large file system to the software. In addition to supporting archival storage and review, the system can also digitize live video using high-speed Direct Memory Access (DMA) from the frame grabber to present uncompressed data to the network. Fully functional prototypes have provided the proof of concept, with full deployment in the institution planned as the next stage. PMID:7949964
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
NASA Astrophysics Data System (ADS)
Wan, Hongdan; Liu, Linqian; Ding, Zuoqin; Wang, Jie; Xiao, Yu; Zhang, Zuxing
2018-06-01
This paper proposes and demonstrates a single-longitudinal-mode, narrow bandwidth fiber laser, using an ultra-high roundness microsphere resonator (MSR) with a stabilized package as the single-longitudinal-mode selector inside a double-ring fiber cavity. By improving the heating technology and surface cleaning process, MSR with high Q factor are obtained. With the optimized coupling condition, light polarization state and fiber taper diameter, we achieve whispering gallery mode (WGM) spectra with a high extinction ratio of 23 dB, coupling efficiency of 99.5%, a 3 dB bandwidth of 1 pm and a side-mode-suppression-ratio of 14.5 dB. The proposed fiber laser produces single-longitudinal-mode laser output with a 20-dB frequency linewidth of about 340 kHz, a signal-to-background ratio of 54 dB and a high long-term stability without mode-hopping, which is potential for optical communication and sensing applications.
Mapping of H.264 decoding on a multiprocessor architecture
NASA Astrophysics Data System (ADS)
van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.
2003-05-01
Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the overall speedup.
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
Wang, Bei; Ethier, Stephane; Tang, William; ...
2017-06-29
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
A comparison of high-speed links, their commercial support and ongoing R&D activities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzalez, H.L.; Barsotti, E.; Zimmermann, S.
Technological advances and a demanding market have forced the development of higher bandwidth communication standards for networks, data links and busses. Most of these emerging standards are gathering enough momentum that their widespread availability and lower prices are anticipated. The hardware and software that support the physical media for most of these links is currently available, allowing the user community to implement fairly high-bandwidth data links and networks with commercial components. Also, switches needed to support these networks are available or being developed. The commercial suppose of high-bandwidth data links, networks and switching fabrics provides a powerful base for themore » implementation of high-bandwidth data acquisition systems. A large data acquisition system like the one for the Solenoidal Detector Collaboration (SDC) at the SSC can benefit from links and networks that support an integrated systems engineering approach, for initialization, downloading, diagnostics, monitoring, hardware integration and event data readout. The issue that our current work addresses is the possibility of having a channel/network that satisfies the requirements of an integrated data acquisition system. In this paper we present a brief description of high-speed communication links and protocols that we consider of interest for high energy physic High Performance Parallel Interface (HIPPI). Serial HIPPI, Fibre Channel (FC) and Scalable Coherent Interface (SCI). In addition, the initial work required to implement an SDC-like data acquisition system is described.« less
A comparison of high-speed links, their commercial support and ongoing R D activities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzalez, H.L.; Barsotti, E.; Zimmermann, S.
Technological advances and a demanding market have forced the development of higher bandwidth communication standards for networks, data links and busses. Most of these emerging standards are gathering enough momentum that their widespread availability and lower prices are anticipated. The hardware and software that support the physical media for most of these links is currently available, allowing the user community to implement fairly high-bandwidth data links and networks with commercial components. Also, switches needed to support these networks are available or being developed. The commercial suppose of high-bandwidth data links, networks and switching fabrics provides a powerful base for themore » implementation of high-bandwidth data acquisition systems. A large data acquisition system like the one for the Solenoidal Detector Collaboration (SDC) at the SSC can benefit from links and networks that support an integrated systems engineering approach, for initialization, downloading, diagnostics, monitoring, hardware integration and event data readout. The issue that our current work addresses is the possibility of having a channel/network that satisfies the requirements of an integrated data acquisition system. In this paper we present a brief description of high-speed communication links and protocols that we consider of interest for high energy physic High Performance Parallel Interface (HIPPI). Serial HIPPI, Fibre Channel (FC) and Scalable Coherent Interface (SCI). In addition, the initial work required to implement an SDC-like data acquisition system is described.« less
Amaya, N; Yan, S; Channegowda, M; Rofoee, B R; Shu, Y; Rashidi, M; Ou, Y; Hugues-Salas, E; Zervas, G; Nejabati, R; Simeonidou, D; Puttnam, B J; Klaus, W; Sakaguchi, J; Miyazawa, T; Awaji, Y; Harai, H; Wada, N
2014-02-10
We present results from the first demonstration of a fully integrated SDN-controlled bandwidth-flexible and programmable SDM optical network utilizing sliceable self-homodyne spatial superchannels to support dynamic bandwidth and QoT provisioning, infrastructure slicing and isolation. Results show that SDN is a suitable control plane solution for the high-capacity flexible SDM network. It is able to provision end-to-end bandwidth and QoT requests according to user requirements, considering the unique characteristics of the underlying SDM infrastructure.
NASA Astrophysics Data System (ADS)
Bock, Carlos; Prat, Josep; Walker, Stuart D.
2005-12-01
A novel time/space/wavelength division multiplexing (TDM/WDM) architecture using the free spectral range (FSR) periodicity of the arrayed waveguide grating (AWG) is presented. A shared tunable laser and a photoreceiver stack featuring dynamic bandwidth allocation (DBA) and remote modulation are used for transmission and reception. Transmission tests show correct operation at 2.5 Gb/s to a 30-km reach, and network performance calculations using queue modeling demonstrate that a high-bandwidth-demanding application could be deployed on this network.
WMSA for wireless communication applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vats, Monika; Agarwal, Alok, E-mail: alokagarwal26@yahoo.com; Kumar, Ravindra
2016-03-09
Modified rectangular compact microstrip patch antenna having finite ground plane is proposed in this paper. Wideband Microstrip Antenna (WMSA) is achieved by corner cut and inserting air gaps inside the edges of the radiating patch having finite ground plane. The obtained impedance bandwidth for 10 dB return loss for the operating frequency f{sub 0} = 2.09 GHz is 28.7 % (600 MHz), which is very high as compared to the bandwidth obtained for the conventional microstrip antenna. Compactness with wide bandwidth of this antenna is practically useful for the wireless communication systems.
Cross-phase modulation bandwidth in ultrafast fiber wavelength converters
NASA Astrophysics Data System (ADS)
Luís, Ruben S.; Monteiro, Paulo; Teixeira, António
2006-12-01
We propose a novel analytical model for the characterization of fiber cross-phase modulation (XPM) in ultrafast all-optical fiber wavelength converters, operating at modulation frequencies higher than 1THz. The model is used to compare the XPM frequency limitations of a conventional and a highly nonlinear dispersion shifted fiber (HN-DSF) and a bismuth oxide-based fiber, introducing the XPM bandwidth as a design parameter. It is shown that the HN-DSF presents the highest XPM bandwidth, above 1THz, making it the most appropriate for ultrafast wavelength conversion.
Faraday anomalous dispersion optical filters
NASA Technical Reports Server (NTRS)
Shay, T. M.; Yin, B.
1992-01-01
The present calculations of the performance of Faraday anomalous dispersion optical filters (FADOF) on IR transitions indicate that such filters may furnish high transmission, narrow-pass bandwidth, and low equivalent noise bandwidth under optimum operating conditions. A FADOF consists of an atomic vapor cell between crossed polarizers that are subject to a dc magnetic field along the optical path; when linearly polarized light travels along the direction of the magnetic field through the dispersive atomic vapor, a polarization rotation occurs. If FADOF conditions are suitably adjusted, a maximum transmission with very narrow bandwidth is obtained.
Gain-Compensating Circuit For NDE and Ultrasonics
NASA Technical Reports Server (NTRS)
Kushnick, Peter W.
1987-01-01
High-frequency gain-compensating circuit designed for general use in nondestructive evaluation and ultrasonic measurements. Controls gain of ultrasonic receiver as function of time to aid in measuring attenuation of samples with high losses; for example, human skin and graphite/epoxy composites. Features high signal-to-noise ratio, large signal bandwidth and large dynamic range. Control bandwidth of 5 MHz ensures accuracy of control signal. Currently being used for retrieval of more information from ultrasonic signals sent through composite materials that have high losses, and to measure skin-burn depth in humans.
Building a Champagne Network on a Beer Budget
ERIC Educational Resources Information Center
Dolan, Jon; Pederson, Curt
2004-01-01
Oregon State University's demand for bandwidth to support scientific collaboration and research continues to grow exponentially, while state funding declines due to hard economic times. The challenge faced by these authors was to find creative yet fiscally responsible ways to meet OSU's bandwidth demands. Looking at their options for high-capacity…
Fast Faraday Cup With High Bandwidth
Deibele, Craig E [Knoxville, TN
2006-03-14
A circuit card stripline Fast Faraday cup quantitatively measures the picosecond time structure of a charged particle beam. The stripline configuration maintains signal integrity, and stitching of the stripline increases the bandwidth. A calibration procedure ensures the measurement of the absolute charge and time structure of the charged particle beam.
A comparison of methods for DPLL loop filter design
NASA Technical Reports Server (NTRS)
Aguirre, S.; Hurd, W. J.; Kumar, R.; Statman, J.
1986-01-01
Four design methodologies for loop filters for a class of digital phase-locked loops (DPLLs) are presented. The first design maps an optimum analog filter into the digital domain; the second approach designs a filter that minimizes in discrete time weighted combination of the variance of the phase error due to noise and the sum square of the deterministic phase error component; the third method uses Kalman filter estimation theory to design a filter composed of a least squares fading memory estimator and a predictor. The last design relies on classical theory, including rules for the design of compensators. Linear analysis is used throughout the article to compare different designs, and includes stability, steady state performance and transient behavior of the loops. Design methodology is not critical when the loop update rate can be made high relative to loop bandwidth, as the performance approaches that of continuous time. For low update rates, however, the miminization method is significantly superior to the other methods.
NASA Astrophysics Data System (ADS)
Cacouris, Theodore; Rao, Rajasekhar; Rokitski, Rostislav; Jiang, Rui; Melchior, John; Burfeindt, Bernd; O'Brien, Kevin
2012-03-01
Deep UV (DUV) lithography is being applied to pattern increasingly finer geometries, leading to solutions like double- and multiple-patterning. Such process complexities lead to higher costs due to the increasing number of steps required to produce the desired results. One of the consequences is that the lithography equipment needs to provide higher operating efficiencies to minimize the cost increases, especially for producers of memory devices that experience a rapid decline in sales prices of these products over time. In addition to having introduced higher power 193nm light sources to enable higher throughput, we previously described technologies that also enable: higher tool availability via advanced discharge chamber gas management algorithms; improved process monitoring via enhanced on-board beam metrology; and increased depth of focus (DOF) via light source bandwidth modulation. In this paper we will report on the field performance of these technologies with data that supports the desired improvements in on-wafer performance and operational efficiencies.
Assessing the Effects of Data Compression in Simulations Using Physically Motivated Metrics
Laney, Daniel; Langer, Steven; Weber, Christopher; ...
2014-01-01
This paper examines whether lossy compression can be used effectively in physics simulations as a possible strategy to combat the expected data-movement bottleneck in future high performance computing architectures. We show that, for the codes and simulations we tested, compression levels of 3–5X can be applied without causing significant changes to important physical quantities. Rather than applying signal processing error metrics, we utilize physics-based metrics appropriate for each code to assess the impact of compression. We evaluate three different simulation codes: a Lagrangian shock-hydrodynamics code, an Eulerian higher-order hydrodynamics turbulence modeling code, and an Eulerian coupled laser-plasma interaction code. Wemore » compress relevant quantities after each time-step to approximate the effects of tightly coupled compression and study the compression rates to estimate memory and disk-bandwidth reduction. We find that the error characteristics of compression algorithms must be carefully considered in the context of the underlying physics being modeled.« less
Star sensor/mapper with a self deployable, high-attenuation light shade for SAS-B
NASA Technical Reports Server (NTRS)
Schenkel, F. W.; Finkel, A.
1972-01-01
A star sensor/mapper to determine positional data for the small astronomy satellites was tested to detect stars of plus 4 visual magnitude. It utilizes two information channels with memory so that it can be used with a low-data-rate telemetry system. One channel yields star amplitude information; the other yields the time of star occurrence as the star passes across an N-slit reticle/photomultiplier detector system. Some of the features of the star sensor/mapper are its low weight of 6.5 pounds, low power consumption of 0.4 watt, bandwidth switching to match the satellite spin rate, optical equalization of sensitivity over the 5-by-10 deg field of view, and self-deployable sunshade. The attitude determination accuracy is 3 arc minutes. This is determined by such parameters as the reticle configuration, optical train, and telemetry readout. The optical and electronic design of the star sensor/mapper, its expansion capabilities, and its features are discussed.
Bandwidth Enabled Flight Operations: Examining the Possibilities
NASA Technical Reports Server (NTRS)
Pisanich, Greg; Renema, Fritz; Clancy, Dan (Technical Monitor)
2002-01-01
The Bandwidth Enabled Flight Operations project is a research effort at the NASA Ames Research Center to investigate the use of satellite communications to improve aviation safety and capacity. This project is a follow on to the AeroSAPIENT Project, which demonstrated methods for transmitting high bandwidth data in various configurations. For this research, we set a goal to nominally use only 10 percent of the available bandwidth demonstrated by AeroSAPIENT or projected by near-term technology advances. This paper describes the results of our research, including available satellite bandwidth, commercial and research efforts to provide these services, and some of the limiting factors inherent with this communications medium. It also describes our investigation into the needs of the stakeholders (Airlines, Pilots, Cabin Crews, ATC, Maintenance, etc). The paper also describes our development of low-cost networked flight deck and airline operations center simulations that were used to demonstrate two application areas: Providing real time weather information to the commercial flight deck, and enhanced crew monitoring and control for airline operations centers.
NASA Astrophysics Data System (ADS)
Korobko, M.; Kleybolte, L.; Ast, S.; Miao, H.; Chen, Y.; Schnabel, R.
2017-04-01
The shot-noise limited peak sensitivity of cavity-enhanced interferometric measurement devices, such as gravitational-wave detectors, can be improved by increasing the cavity finesse, even when comparing fixed intracavity light powers. For a fixed light power inside the detector, this comes at the price of a proportional reduction in the detection bandwidth. High sensitivity over a large span of signal frequencies, however, is essential for astronomical observations. It is possible to overcome this standard sensitivity-bandwidth limit using nonclassical correlations in the light field. Here, we investigate the internal squeezing approach, where the parametric amplification process creates a nonclassical correlation directly inside the interferometer cavity. We theoretically analyze the limits of the approach and measure 36% increase in the sensitivity-bandwidth product compared to the classical case. To our knowledge, this is the first experimental demonstration of an improvement in the sensitivity-bandwidth product using internal squeezing, opening the way for a new class of optomechanical force sensing devices.
Sensitivity-Bandwidth Limit in a Multimode Optoelectromechanical Transducer
NASA Astrophysics Data System (ADS)
Moaddel Haghighi, I.; Malossi, N.; Natali, R.; Di Giuseppe, G.; Vitali, D.
2018-03-01
An optoelectromechanical system formed by a nanomembrane capacitively coupled to an L C resonator and to an optical interferometer has recently been employed for the highly sensitive optical readout of rf signals [T. Bagci et al., Nature (London) 507, 81 (2013), 10.1038/nature13029]. We propose and experimentally demonstrate how the bandwidth of such a transducer can be increased by controlling the interference between two electromechanical interaction pathways of a two-mode mechanical system. With a proof-of-principle device operating at room temperature, we achieve a sensitivity of 300 nV /√{Hz } over a bandwidth of 15 kHz in the presence of radio-frequency noise, and an optimal shot-noise-limited sensitivity of 10 nV /√{Hz } over a bandwidth of 5 kHz. We discuss strategies for improving the performance of the device, showing that, for the same given sensitivity, a mechanical multimode transducer can achieve a bandwidth significantly larger than that for a single-mode one.
Constrained ℋ∞ control for low bandwidth active suspensions
NASA Astrophysics Data System (ADS)
Wasiwitono, Unggul; Sutantra, I. Nyoman
2017-08-01
Low Bandwidth Active Suspension (LBAS) is shown to be more competitive to High Bandwidth Active Suspension (HBAS) when energy and cost aspects are taken into account. In this paper, the constrained ℋ∞ control scheme is applied for LBAS system. The ℋ∞ performance is used to measure ride comfort while the concept of reachable set in a state-space ellipsoid defined by a quadratic storage function is used to capture the time domain constraint that representing the requirements for road holding, suspension deflection limitation and actuator saturation. Then, the control problem is derived in the framework of Linear Matrix Inequality (LMI) optimization. The simulation is conducted considering the road disturbance as a stationary random process. The achievable performance of LBAS is analyzed for different values of bandwidth and damping ratio.
Nonlinear Detection, Estimation, and Control for Free-Space Optical Communication
2008-08-17
original message. The promising features of this communication scheme such as high-bandwidth, power efficiency, and security, render it a viable means...bandwidth, power efficiency, and security, render it a viable means for high data rate point-to-point communication. In this dissertation, we adopt a...Department of Electrical and Computer Engineering In free-space optical communication, the intensity of a laser beam is modulated by a message, the beam
Bondu, Magalie; Brooks, Christopher; Jakobsen, Christian; Oakes, Keith; Moselund, Peter Morten; Leick, Lasse; Bang, Ole; Podoleanu, Adrian
2016-06-01
We demonstrate a record bandwidth high energy supercontinuum source suitable for multispectral photoacoustic microscopy. The source has more than 150 nJ/10 nm bandwidth over a spectral range of 500 to 1600 nm. This performance is achieved using a carefully designed fiber taper with large-core input for improved power handling and small-core output that provides the desired spectral range of the supercontinuum source.
Enabling technology for future gigabit-symmetric FTTH: coherent OCDMA over WDM-PON
NASA Astrophysics Data System (ADS)
Kitayama, Ken-ichi; Wang, Xu; Wada, Naoya
2006-09-01
For the future broadband Fiber-To-The-Home (FTTH) services, it will be revealed to be a myth that the low bit-rate uplink may be deployed, while only the downlink has to be high bit-rate. Current FTTH system forces the customers a stressful access in the uplink due to its MAC based on TDMA under always-on service provisionings. Without an abundant bandwidth of uplink available, peer-to-peer applications such as exchanging gigabyte files of uncompressed 1.2 Gbps high-definition (HD) TV class or even 6Gbps super-high-definition (SHD)class digital movies as well as teleconferencing and bi-directional medical applications such as tele-diagnosis and -surgery won't become widewpread. With a narrowband uplink, even non peer-to-peer customers will be put in a disadvantageous position by being forced to share the limited bandwidth with a limited number of bandwidth-hungry users.
Cavity resonance absorption in ultra-high bandwidth CRT deflection structure by a resistive load
Dunham, M.E.; Hudson, C.L.
1993-05-11
An improved ultra-high bandwidth helical coil deflection structure for a cathode ray tube is described comprising a first metal member having a bore therein, the metal walls of which form a first ground plane; a second metal member coaxially mounted in the bore of the first metal member and forming a second ground plane; a helical deflection coil coaxially mounted within the bore between the two ground planes; and a resistive load disposed in one end of the bore and electrically connected to the first and second ground planes, the resistive load having an impedance substantially equal to the characteristic impedance of the coaxial line formed by the two coaxial ground planes to inhibit cavity resonance in the structure within the ultra-high bandwidth of operation. Preferably, the resistive load comprises a carbon film on a surface of an end plug in one end of the bore.
Cavity resonance absorption in ultra-high bandwidth CRT deflection structure by a resistive load
Dunham, Mark E.; Hudson, Charles L.
1993-01-01
An improved ultra-high bandwidth helical coil deflection structure for a hode ray tube is described comprising a first metal member having a bore therein, the metal walls of which form a first ground plane; a second metal member coaxially mounted in the bore of the first metal member and forming a second ground plane; a helical deflection coil coaxially mounted within the bore between the two ground planes; and a resistive load disposed in one end of the bore and electrically connected to the first and second ground planes, the resistive load having an impedance substantially equal to the characteristic impedance of the coaxial line formed by the two coaxial ground planes to inhibit cavity resonance in the structure within the ultra-high bandwidth of operation. Preferably, the resistive load comprises a carbon film on a surface of an end plug in one end of the bore.
The Army's Use of the Advanced Communications Technology Satellite
NASA Technical Reports Server (NTRS)
Ilse, Kenneth
1996-01-01
Tactical operations require military commanders to be mobile and have a high level of independence in their actions. Communications capabilities providing intelligence and command orders in these tactical situations have been limited to simple voice communications or low-rate narrow bandwidth communications because of the need for immediate reliable connectivity. The Advanced Communications Technology Satellite (ACTS) has brought an improved communications tool to the tactical commander giving the ability to gain access to a global communications system using high data rates and wide bandwidths. The Army has successfully tested this new capability of bandwidth-on-demand and high data rates for commanders in real-world conditions during Operation UPHOLD DEMOCRACY in Haiti during the fall and winter of 1994. This paper examines ACTS use by field commanders and details the success of the ACTS system in support of a wide variety of field condition command functions.
Optical interconnect technologies for high-bandwidth ICT systems
NASA Astrophysics Data System (ADS)
Chujo, Norio; Takai, Toshiaki; Mizushima, Akiko; Arimoto, Hideo; Matsuoka, Yasunobu; Yamashita, Hiroki; Matsushima, Naoki
2016-03-01
The bandwidth of information and communication technology (ICT) systems is increasing and is predicted to reach more than 10 Tb/s. However, an electrical interconnect cannot achieve such bandwidth because of its density limits. To solve this problem, we propose two types of high-density optical fiber wiring for backplanes and circuit boards such as interface boards and switch boards. One type uses routed ribbon fiber in a circuit board because it has the ability to be formed into complex shapes to avoid interfering with the LSI and electrical components on the board. The backplane is required to exhibit high density and flexibility, so the second type uses loose fiber. We developed a 9.6-Tb/s optical interconnect demonstration system using embedded optical modules, optical backplane, and optical connector in a network apparatus chassis. We achieved 25-Gb/s transmission between FPGAs via the optical backplane.
Fiber-optic three axis magnetometer prototype development
NASA Technical Reports Server (NTRS)
Wang, Thomas D.; Mccomb, David G.; Kingston, Bradley R.; Dube, C. Michael; Poehls, Kenneth A.; Wanser, Keith
1989-01-01
The goal of this research program was to develop a high sensitivity, fiber optic, interferometric, three-axis magnetometer for interplanetary spacecraft applications. Dynamics Technology, Inc. (DTI) has successfully integrated a low noise, high bandwidth interferometer with high sensitivity metallic glass transducers. Also, DTI has developed sophisticated signal processing electronics and complete data acquisition, filtering, and display software. The sensor was packaged in a compact, low power and weight unit which facilitates deployment. The magnetic field sensor had subgamma sensitivity and a dynamic range of 10(exp 5) gamma in a 10 Hz bandwidth. Furthermore, the vector instrument exhibited the lowest noise level when only one axis was in operation. A system noise level of 1 gamma rms was observed in a 1 Hz bandwidth. However, with the other two channels operating, the noise level increased by about one order of magnitude. Higher system noise was attributed to cross-channel interference among the dither fields.
Baranwal, Mayank; Gorugantu, Ram S; Salapaka, Srinivasa M
2015-08-01
This paper aims at control design and its implementation for robust high-bandwidth precision (nanoscale) positioning systems. Even though modern model-based control theoretic designs for robust broadband high-resolution positioning have enabled orders of magnitude improvement in performance over existing model independent designs, their scope is severely limited by the inefficacies of digital implementation of the control designs. High-order control laws that result from model-based designs typically have to be approximated with reduced-order systems to facilitate digital implementation. Digital systems, even those that have very high sampling frequencies, provide low effective control bandwidth when implementing high-order systems. In this context, field programmable analog arrays (FPAAs) provide a good alternative to the use of digital-logic based processors since they enable very high implementation speeds, moreover with cheaper resources. The superior flexibility of digital systems in terms of the implementable mathematical and logical functions does not give significant edge over FPAAs when implementing linear dynamic control laws. In this paper, we pose the control design objectives for positioning systems in different configurations as optimal control problems and demonstrate significant improvements in performance when the resulting control laws are applied using FPAAs as opposed to their digital counterparts. An improvement of over 200% in positioning bandwidth is achieved over an earlier digital signal processor (DSP) based implementation for the same system and same control design, even when for the DSP-based system, the sampling frequency is about 100 times the desired positioning bandwidth.
NASA Technical Reports Server (NTRS)
Kory, Carol L.; Wilson, Jeffrey D.
1994-01-01
The V-band frequency range of 59-64 GHz is a region of the millimeter-wave spectrum that has been designated for inter-satellite communications. As a first effort to develop a high-efficiency V-band Traveling-Wave Tube (TWT), variations on a ring-plane slow-wave circuit were computationally investigated to develop an alternative to the more conventional ferruled coupled-cavity circuit. The ring-plane circuit was chosen because of its high interaction impedance, large beam aperture, and excellent thermal dissipation properties. Despite these advantages, however, low bandwidth and high voltage requirements have, until now, prevented its acceptance outside the laboratory. In this paper, the three-dimensional electrodynamic simulation code MAFIA (solution of MAxwell's Equation by the Finite-Integration-Algorithm) is used to investigate methods of increasing the bandwidth and lowering the operating voltage of the ring-plane circuit. Calculations of frequency-phase dispersion, beam on-axis interaction impedance, attenuation and small-signal gain per wavelength were performed for various geometric variations and loading distributions of the ring-plane TWT slow-wave circuit. Based on the results of the variations, a circuit termed the finned-ladder TWT slow-wave circuit was designed and is compared here to the scaled prototype ring-plane and a conventional ferruled coupled-cavity TWT circuit over the V-band frequency range. The simulation results indicate that this circuit has a much higher gain, significantly wider bandwidth, and a much lower voltage requirement than the scaled ring-plane prototype circuit, while retaining its excellent thermal dissipation properties. The finned-ladder circuit has a much larger small-signal gain per wavelength than the ferruled coupled-cavity circuit, but with a moderate sacrifice in bandwidth.
NASA Astrophysics Data System (ADS)
Cone, R. L.; Thiel, C. W.; Sun, Y.; Böttger, Thomas; Macfarlane, R. M.
2012-02-01
Unique spectroscopic properties of isolated rare earth ions in solids offer optical linewidths rivaling those of trapped single atoms and enable a variety of recent applications. We design rare-earth-doped crystals, ceramics, and fibers with persistent or transient "spectral hole" recording properties for applications including high-bandwidth optical signal processing where light and our solids replace the high-bandwidth portion of the electronics; quantum cryptography and information science including the goal of storage and recall of single photons; and medical imaging technology for the 700-900 nm therapeutic window. Ease of optically manipulating rare-earth ions in solids enables capturing complex spectral information in 105 to 108 frequency bins. Combining spatial holography and spectral hole burning provides a capability for processing high-bandwidth RF and optical signals with sub-MHz spectral resolution and bandwidths of tens to hundreds of GHz for applications including range-Doppler radar and high bandwidth RF spectral analysis. Simply stated, one can think of these crystals as holographic recording media capable of distinguishing up to 108 different colors. Ultra-narrow spectral holes also serve as a vibration-insensitive sub-kHz frequency reference for laser frequency stabilization to a part in 1013 over tens of milliseconds. The unusual properties and applications of spectral hole burning of rare earth ions in optical materials are reviewed. Experimental results on the promising Tm3+:LiNbO3 material system are presented and discussed for medical imaging applications. Finally, a new application of these materials as dynamic optical filters for laser noise suppression is discussed along with experimental demonstrations and theoretical modeling of the process.
NASA Astrophysics Data System (ADS)
Bamiedakis, N.; Chen, J.; Penty, R. V.; White, I. H.
2016-03-01
Multimode polymer waveguides are being increasingly considered for use in short-reach board-level optical interconnects as they exhibit favourable optical properties and allow direct integration onto standard PCBs with conventional methods of the electronics industry. Siloxane-based multimode waveguides have been demonstrated with excellent optical transmission performance, while a wide range of passive waveguide components that offer routing flexibility and enable the implementation of complex on-board interconnection architectures has been reported. In recent work, we have demonstrated that these polymer waveguides can exhibit very high bandwidth-length products in excess of 30 GHz×m despite their highly-multimoded nature, while it has been shown that even larger values of > 60 GHz×m can be achieved by adjusting their refractive index profile. Furthermore, the combination of refractive index engineering and launch conditioning schemes can ensure high bandwidth (> 100 GHz×m) and high coupling efficiency (<1 dB) with standard multimode fibre inputs with relatively large alignment tolerances (~17×15 μm2). In the work presented here, we investigate the effects of refractive index engineering on the performance of passive waveguide components (crossings, bends) and provide suitable design rules for their on-board use. It is shown that, depending on the interconnection layout and link requirements, appropriate choice of refractive index profile can provide enhanced component performance, ensuring low loss interconnection and adequate link bandwidth. The results highlight the strong potential of this versatile optical technology for the formation of high-performance board-level optical interconnects with high routing flexibility.
Hardware-software face detection system based on multi-block local binary patterns
NASA Astrophysics Data System (ADS)
Acasandrei, Laurentiu; Barriga, Angel
2015-03-01
Face detection is an important aspect for biometrics, video surveillance and human computer interaction. Due to the complexity of the detection algorithms any face detection system requires a huge amount of computational and memory resources. In this communication an accelerated implementation of MB LBP face detection algorithm targeting low frequency, low memory and low power embedded system is presented. The resulted implementation is time deterministic and uses a customizable AMBA IP hardware accelerator. The IP implements the kernel operations of the MB-LBP algorithm and can be used as universal accelerator for MB LBP based applications. The IP employs 8 parallel MB-LBP feature evaluators cores, uses a deterministic bandwidth, has a low area profile and the power consumption is ~95 mW on a Virtex5 XC5VLX50T. The resulted implementation acceleration gain is between 5 to 8 times, while the hardware MB-LBP feature evaluation gain is between 69 and 139 times.
NASA Astrophysics Data System (ADS)
Fuhrer, Oliver; Chadha, Tarun; Hoefler, Torsten; Kwasniewski, Grzegorz; Lapillonne, Xavier; Leutwyler, David; Lüthi, Daniel; Osuna, Carlos; Schär, Christoph; Schulthess, Thomas C.; Vogt, Hannes
2018-05-01
The best hope for reducing long-standing global climate model biases is by increasing resolution to the kilometer scale. Here we present results from an ultrahigh-resolution non-hydrostatic climate model for a near-global setup running on the full Piz Daint supercomputer on 4888 GPUs (graphics processing units). The dynamical core of the model has been completely rewritten using a domain-specific language (DSL) for performance portability across different hardware architectures. Physical parameterizations and diagnostics have been ported using compiler directives. To our knowledge this represents the first complete atmospheric model being run entirely on accelerators on this scale. At a grid spacing of 930 m (1.9 km), we achieve a simulation throughput of 0.043 (0.23) simulated years per day and an energy consumption of 596 MWh per simulated year. Furthermore, we propose a new memory usage efficiency (MUE) metric that considers how efficiently the memory bandwidth - the dominant bottleneck of climate codes - is being used.
Temporal shaping of quantum states released from a superconducting cavity memory
NASA Astrophysics Data System (ADS)
Burkhart, L.; Axline, C.; Pfaff, W.; Zou, C.; Zhang, M.; Narla, A.; Frunzio, L.; Devoret, M. H.; Jiang, L.; Schoelkopf, R. J.
State transfer and entanglement distribution are essential primitives in network-based quantum information processing. We have previously demonstrated an interface between a quantum memory and propagating light fields in the microwave domain: by parametric conversion in a single Josephson junction, we have coherently released quantum states from a superconducting cavity resonator into a transmission line. Protocols for state transfer mediated by propagating fields typically rely on temporal mode-matching of couplings at both sender and receiver. However, parametric driving on a single junction results in dynamic frequency shifts, raising the question of whether the pumps alone provide enough control for achieving this mode-matching. We show, in theory and experiment, that phase and amplitude shaping of the parametric drives allows arbitrary control over the propagating field, limited only by the drives bandwidth and amplitude constraints. This temporal mode shaping technique allows for release and capture of quantum states, providing a credible route towards state transfer and entanglement generation in quantum networks in which quantum states are stored and processed in cavities.
Memory in random bouncing ball dynamics
NASA Astrophysics Data System (ADS)
Zouabi, C.; Scheibert, J.; Perret-Liaudet, J.
2016-09-01
The bouncing of an inelastic ball on a vibrating plate is a popular model used in various fields, from granular gases to nanometer-sized mechanical contacts. For random plate motion, so far, the model has been studied using Poincaré maps in which the excitation by the plate at successive bounces is assumed to be a discrete Markovian (memoryless) process. Here, we investigate numerically the behaviour of the model for continuous random excitations with tunable correlation time. We show that the system dynamics are controlled by the ratio of the Markovian mean flight time of the ball and the mean time between successive peaks in the motion of the exciting plate. When this ratio, which depends on the bandwidth of the excitation signal, exceeds a certain value, the Markovian approach is appropriate; below, memory of preceding excitations arises, leading to a significant decrease of the jump duration; at the smallest values of the ratio, chattering occurs. Overall, our results open the way for uses of the model in the low-excitation regime, which is still poorly understood.
An area model for on-chip memories and its application
NASA Technical Reports Server (NTRS)
Mulder, Johannes M.; Quach, Nhon T.; Flynn, Michael J.
1991-01-01
An area model suitable for comparing data buffers of different organizations and arbitrary sizes is described. The area model considers the supplied bandwidth of a memory cell and includes such buffer overhead as control logic, driver logic, and tag storage. The model gave less than 10 percent error when verified against real caches and register files. It is shown that, comparing caches and register files in terms of area for the same storage capacity, caches generally occupy more area per bit than register files for small caches because the overhead dominates the cache area at these sizes. For larger caches, the smaller storage cells in the cache provide a smaller total cache area per bit than the register set. Studying cache performance (traffic ratio) as a function of area, it is shown that, for small caches, direct-mapped caches perform significantly better than four-way set-associative caches and, for caches of medium areas, both direct-mapped and set-associative caches perform better than fully associative caches.
NASA Technical Reports Server (NTRS)
Sims, William H.
2015-01-01
This paper will discuss a proposed CubeSat size (3 Units / 6 Units) telemetry system concept being developed at Marshall Space Flight Center (MSFC) in cooperation with Auburn University. The telemetry system incorporates efficient, high-bandwidth communications by developing flight-ready, low-cost, PROTOFLIGHT software defined radio (SDR) payload for use on CubeSats. The current telemetry system is slightly larger in dimension of footprint than required to fit within a 0.75 Unit CubeSat volume. Extensible and modular communications for CubeSat technologies will provide high data rates for science experiments performed by two CubeSats flying in formation in Low Earth Orbit. The project is a collaboration between the University of Alabama in Huntsville and Auburn University to study high energy phenomena in the upper atmosphere. Higher bandwidth capacity will enable high-volume, low error-rate data transfer to and from the CubeSats, while also providing additional bandwidth and error correction margin to accommodate more complex encryption algorithms and higher user volume.
Adaptive Video Streaming Using Bandwidth Estimation for 3.5G Mobile Network
NASA Astrophysics Data System (ADS)
Nam, Hyeong-Min; Park, Chun-Su; Jung, Seung-Won; Ko, Sung-Jea
Currently deployed mobile networks including High Speed Downlink Packet Access (HSDPA) offer only best-effort Quality of Service (QoS). In wireless best effort networks, the bandwidth variation is a critical problem, especially, for mobile devices with small buffers. This is because the bandwidth variation leads to packet losses caused by buffer overflow as well as picture freezing due to high transmission delay or buffer underflow. In this paper, in order to provide seamless video streaming over HSDPA, we propose an efficient real-time video streaming method that consists of the available bandwidth (AB) estimation for the HSDPA network and the transmission rate control to prevent buffer overflows/underflows. In the proposed method, the client estimates the AB and the estimated AB is fed back to the server through real-time transport control protocol (RTCP) packets. Then, the server adaptively adjusts the transmission rate according to the estimated AB and the buffer state obtained from the RTCP feedback information. Experimental results show that the proposed method achieves seamless video streaming over the HSDPA network providing higher video quality and lower transmission delay.
Sun, Xiaodong; Lv, Xuliang; Sui, Mingxu; Weng, Xiaodi; Li, Xiaopeng; Wang, Jijun
2018-01-01
To clear away the harmful effects of the increment of electromagnetic pollution, high performance absorbers with appropriate impedance matching and strong attenuation capacity are strongly desired. In this study, a chain-like PPy aerogel decorated with MOF-derived nanoporous Co/C (Co/C@PPy) has been successfully prepared by a self-assembled polymerization method. With a filler loading ratio of 10 wt %, the composite of Co/C@PPy could achieve a promising electromagnetic absorption performance both in intensity and bandwidth. An optimal reflection loss value of −44.76 dB is achieved, and the effective bandwidth (reflection loss lower than −10 dB) is as large as 6.56 GHz. Furthermore, a composite only loaded with 5 wt % Co/C@PPy also achieves an effective bandwidth of 5.20 GHz, which is even better than numerous reported electromagnetic absorption (EA) materials. The result reveals that the as-fabricated Co/C@PPy—with high absorption intensity, broad bandwidth, and light weight properties—can be utilized as a competitive absorber. PMID:29751650
Towards high-capacity fibre-optic communications at the speed of light in vacuum
NASA Astrophysics Data System (ADS)
Poletti, F.; Wheeler, N. V.; Petrovich, M. N.; Baddela, N.; Numkam Fokoua, E.; Hayes, J. R.; Gray, D. R.; Li, Z.; Slavík, R.; Richardson, D. J.
2013-04-01
Wide-bandwidth signal transmission with low latency is emerging as a key requirement in a number of applications, including the development of future exaflop-scale supercomputers, financial algorithmic trading and cloud computing. Optical fibres provide unsurpassed transmission bandwidth, but light propagates 31% slower in a silica glass fibre than in vacuum, thus compromising latency. Air guidance in hollow-core fibres can reduce fibre latency very significantly. However, state-of-the-art technology cannot achieve the combined values of loss, bandwidth and mode-coupling characteristics required for high-capacity data transmission. Here, we report a fundamentally improved hollow-core photonic-bandgap fibre that provides a record combination of low loss (3.5 dB km-1) and wide bandwidth (160 nm), and use it to transmit 37 × 40 Gbit s-1 channels at a 1.54 µs km-1 faster speed than in a conventional fibre. This represents the first experimental demonstration of fibre-based wavelength division multiplexed data transmission at close to (99.7%) the speed of light in vacuum.
Sun, Xiaodong; Lv, Xuliang; Sui, Mingxu; Weng, Xiaodi; Li, Xiaopeng; Wang, Jijun
2018-05-11
To clear away the harmful effects of the increment of electromagnetic pollution, high performance absorbers with appropriate impedance matching and strong attenuation capacity are strongly desired. In this study, a chain-like PPy aerogel decorated with MOF-derived nanoporous Co/C (Co/C@PPy) has been successfully prepared by a self-assembled polymerization method. With a filler loading ratio of 10 wt %, the composite of Co/C@PPy could achieve a promising electromagnetic absorption performance both in intensity and bandwidth. An optimal reflection loss value of −44.76 dB is achieved, and the effective bandwidth (reflection loss lower than −10 dB) is as large as 6.56 GHz. Furthermore, a composite only loaded with 5 wt % Co/C@PPy also achieves an effective bandwidth of 5.20 GHz, which is even better than numerous reported electromagnetic absorption (EA) materials. The result reveals that the as-fabricated Co/C@PPy—with high absorption intensity, broad bandwidth, and light weight properties—can be utilized as a competitive absorber.
Methods and Devices for Modifying Active Paths in a K-Delta-1-Sigma Modulator
NASA Technical Reports Server (NTRS)
Ardalan, Sasan (Inventor)
2017-01-01
The invention relates to an improved K-Delta-1-Sigma Modulators (KG1Ss) that achieve multi GHz sampling rates with 90 nm and 45 nm CMOS processes, and that provide the capability to balance performance with power in many applications. The improved KD1Ss activate all paths when high performance is needed (e.g. high bandwidth), and reduce the effective bandwidth by shutting down multiple paths when low performance is required. The improved KD1Ss can adjust the baseband filtering for lower bandwidth, and can provide large savings in power consumption while maintaining the communication link, which is a great advantage in space communications. The improved KD1Ss herein provides a receiver that adjusts to accommodate a higher rate when a packet is received at a low bandwidth, and at a initial lower rate, power is saved by turning off paths in the KD1S Analog to Digital Converter, and where when a higher rate is required, multiple paths are enabled in the KD1S to accommodate the higher band widths.
Ultra-high bandwidth quantum secured data transmission
Dynes, James F.; Tam, Winci W-S.; Plews, Alan; Fröhlich, Bernd; Sharpe, Andrew W.; Lucamarini, Marco; Yuan, Zhiliang; Radig, Christian; Straw, Andrew; Edwards, Tim; Shields, Andrew J.
2016-01-01
Quantum key distribution (QKD) provides an attractive means for securing communications in optical fibre networks. However, deployment of the technology has been hampered by the frequent need for dedicated dark fibres to segregate the very weak quantum signals from conventional traffic. Up until now the coexistence of QKD with data has been limited to bandwidths that are orders of magnitude below those commonly employed in fibre optic communication networks. Using an optimised wavelength divisional multiplexing scheme, we transport QKD and the prevalent 100 Gb/s data format in the forward direction over the same fibre for the first time. We show a full quantum encryption system operating with a bandwidth of 200 Gb/s over a 100 km fibre. Exploring the ultimate limits of the technology by experimental measurements of the Raman noise, we demonstrate it is feasible to combine QKD with 10 Tb/s of data over a 50 km link. These results suggest it will be possible to integrate QKD and other quantum photonic technologies into high bandwidth data communication infrastructures, thereby allowing their widespread deployment. PMID:27734921
Ultra-high bandwidth quantum secured data transmission
NASA Astrophysics Data System (ADS)
Dynes, James F.; Tam, Winci W.-S.; Plews, Alan; Fröhlich, Bernd; Sharpe, Andrew W.; Lucamarini, Marco; Yuan, Zhiliang; Radig, Christian; Straw, Andrew; Edwards, Tim; Shields, Andrew J.
2016-10-01
Quantum key distribution (QKD) provides an attractive means for securing communications in optical fibre networks. However, deployment of the technology has been hampered by the frequent need for dedicated dark fibres to segregate the very weak quantum signals from conventional traffic. Up until now the coexistence of QKD with data has been limited to bandwidths that are orders of magnitude below those commonly employed in fibre optic communication networks. Using an optimised wavelength divisional multiplexing scheme, we transport QKD and the prevalent 100 Gb/s data format in the forward direction over the same fibre for the first time. We show a full quantum encryption system operating with a bandwidth of 200 Gb/s over a 100 km fibre. Exploring the ultimate limits of the technology by experimental measurements of the Raman noise, we demonstrate it is feasible to combine QKD with 10 Tb/s of data over a 50 km link. These results suggest it will be possible to integrate QKD and other quantum photonic technologies into high bandwidth data communication infrastructures, thereby allowing their widespread deployment.
Characteristic analysis of diaphragm-type transducer that is thick relative to its size
NASA Astrophysics Data System (ADS)
Ishiguro, Yuya; Zhu, Jing; Tagawa, Norio; Okubo, Tsuyoshi; Okubo, Kan
2017-07-01
In recent years, high-performance piezoelectric micromachined ultrasonic transducers (PMUTs) have been fabricated by micro electro mechanical systems (MEMS) technology. For high-resolution imaging, it is important to broaden the frequency bandwidth. By reducing the diaphragm size to increase the resonance frequency, the film thickness becomes relatively larger and hence the transmitting and receiving characteristics may different from those of a usual thin diaphragm. In this study, we examine the performance of a square-diaphragm-type lead zirconate titanate (PZT) transducer through simulations. To realize the desired resonance frequency of 20 MHz, firstly, the diaphragm size and the thickness of the layers of PZT and Si constituting a PMUT are examined, and then, three PZT/Si models with different thicknesses are selected. Subsequently, using the models, we analyze the transmitting efficiency, transmitting bandwidth, receiving sensitivity (piezoelectric voltage/electric charge), and receiving bandwidth using an FEM simulator. It is found that the proposed models can transmit ultrasound independently of the diaphragm vibration and have wide bandwidth of the receiving frequency as compared with that of a typical PMUT.
Wang, Zhaoyong; Pan, Zhengqing; Fang, Zujie; Ye, Qing; Lu, Bin; Cai, Haiwen; Qu, Ronghui
2015-11-15
A phase-sensitive optical time-domain reflectometry (Φ-OTDR) with a temporally sequenced multi-frequency (TSMF) source is proposed. This technique can improve the system detection bandwidth without the sensing range decreasing. Up to 0.5 MHz detection bandwidth over 9.6 km is experimentally demonstrated as an example. To the best of our knowledge, this is the first time that such a high detection bandwidth over such a long sensing range is reported in Φ-OTDR-based distributed vibration sensing. The technical issues of TSMF Φ-OTDR are discussed in this Letter. This technique will help Φ-OTDR find new important foreground in long-haul distributed broadband-detection applications, such as structural-health monitoring and partial-discharge online monitoring of high voltage power cables.
Progress and issues for high-speed vertical cavity surface emitting lasers
NASA Astrophysics Data System (ADS)
Lear, Kevin L.; Al-Omari, Ahmad N.
2007-02-01
Extrinsic electrical, thermal, and optical issues rather than intrinsic factors currently constrain the maximum bandwidth of directly modulated vertical cavity surface emitting lasers (VCSELs). Intrinsic limits based on resonance frequency, damping, and K-factor analysis are summarized. Previous reports are used to compare parasitic circuit values and electrical 3dB bandwidths and thermal resistances. A correlation between multimode operation and junction heating with bandwidth saturation is presented. The extrinsic factors motivate modified bottom-emitting structures with no electrical pads, small mesas, copper plated heatsinks, and uniform current injection. Selected results on high speed quantum well and quantum dot VCSELs at 850 nm, 980 nm, and 1070 nm are reviewed including small-signal 3dB frequencies up to 21.5 GHz and bit rates up to 30 Gb/s.
Optimal design of similariton fiber lasers without gain-bandwidth limitation.
Li, Xingliang; Zhang, Shumin; Yang, Zhenjun
2017-07-24
We have numerically investigated broadband high-energy similariton fiber lasers, demonstrated that the self-similar evolution of pulses can locate in a segment of photonic crystal fiber without gain-bandwidth limitation. The effects of various parameters, including the cavity length, the spectral filter bandwidth, the pump power, the length of the photonic crystal fiber and the output coupling ratio have also been studied in detail. Using the optimal parameters, a single pulse with spectral width of 186.6 nm, pulse energy of 23.8 nJ, dechirped pulse duration of 22.5 fs and dechirped pulse peak power of 1.26 MW was obtained. We believe that this detailed analysis of the behaviour of pulses in the similariton regime may have major implications in the development of broadband high-energy fiber lasers.
Plastic straw: future of high-speed signaling
NASA Astrophysics Data System (ADS)
Song, Ha Il; Jin, Huxian; Bae, Hyeon-Min
2015-11-01
The ever-increasing demand for bandwidth triggered by mobile and video Internet traffic requires advanced interconnect solutions satisfying functional and economic constraints. A new interconnect called E-TUBE is proposed as a cost-and-power-effective all-electrical-domain wideband waveguide solution for high-speed high-volume short-reach communication links. The E-TUBE achieves an unprecedented level of performance in terms of bandwidth-per-carrier frequency, power, and density without requiring a precision manufacturing process unlike conventional optical/waveguide solutions. The E-TUBE exhibits a frequency-independent loss-profile of 4 dB/m and has nearly 20-GHz bandwidth over the V band. A single-sideband signal transmission enabled by the inherent frequency response of the E-TUBE renders two-times data throughput without any physical overhead compared to conventional radio frequency communication technologies. This new interconnect scheme would be attractive to parties interested in high throughput links, including but not limited to, 100/400 Gbps chip-to-chip communications.
OM300 Direction Drilling Module
MacGugan, Doug
2013-08-22
OM300 – Geothermal Direction Drilling Navigation Tool: Design and produce a prototype directional drilling navigation tool capable of high temperature operation in geothermal drilling Accuracies of 0.1° Inclination and Tool Face, 0.5° Azimuth Environmental Ruggedness typical of existing oil/gas drilling Multiple Selectable Sensor Ranges High accuracy for navigation, low bandwidth High G-range & bandwidth for Stick-Slip and Chirp detection Selectable serial data communications Reduce cost of drilling in high temperature Geothermal reservoirs Innovative aspects of project Honeywell MEMS* Vibrating Beam Accelerometers (VBA) APS Flux-gate Magnetometers Honeywell Silicon-On-Insulator (SOI) High-temperature electronics Rugged High-temperature capable package and assembly process
VizieR Online Data Catalog: AR Sco VLA radio observations (Stanway+, 2018)
NASA Astrophysics Data System (ADS)
Stanway, E. R.; Marsh, T. R.; Chote, P.; Gaensicke, B. T.; Steeghs, D.; Wheatley, P. J.
2018-02-01
Time series VLA radio observations were undertaken of the highly variable white dwarf binary AR Scorpii. These were analysed for periodicity, spectral behaviour and other characteristics. Here we present time series data in the Stokes I parameter at three frequencies. These were centred at 1.5GHz (1GHz bandwidth), 5GHz (2GHz bandwidth) and 9GHz (2GHz bandwidth). The AR Sco binary is unresolved at these frequencies. In the case of the 1.5GHz data, fluxes have been deconvolved with those of a neighbouring object. (3 data files).
NASA Astrophysics Data System (ADS)
Liu, Yang; Li, Shu-qing; Feng, Zhong-ying; Liu, Xiao-fei; Gao, Jin-yue
2016-12-01
To obtain the weak signal light detection from the high background noise, we present a theoretical study on the ultra-narrow bandwidth tunable atomic filter with electromagnetically induced transparency. In a three-level Λ -type atomic system in the rubidium D1 line, the bandwidth of the EIT atomic filter is narrowed to ~6.5 \\text{MHz} . And the single peak transmission of the filter can be up to 86% . Moreover, the transmission wavelength can be tuned by changing the coupling light frequency. This theoretical scheme can also be applied to other alkali atomic systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, James; Grosklos, Guen
We examine the properties and performance of kernelized anomaly detectors, with an emphasis on the Mahalanobis-distance-based kernel RX (KRX) algorithm. Although the detector generally performs well for high-bandwidth Gaussian kernels, it exhibits problematic (in some cases, catastrophic) performance for distances that are large compared to the bandwidth. By comparing KRX to two other anomaly detectors, we can trace the problem to a projection in feature space, which arises when a pseudoinverse is used on the covariance matrix in that feature space. Here, we show that a regularized variant of KRX overcomes this difficulty and achieves superior performance over a widemore » range of bandwidths.« less
Ethernet-Enabled Power and Communication Module for Embedded Processors
NASA Technical Reports Server (NTRS)
Perotti, Jose; Oostdyk, Rebecca
2010-01-01
The power and communications module is a printed circuit board (PCB) that has the capability of providing power to an embedded processor and converting Ethernet packets into serial data to transfer to the processor. The purpose of the new design is to address the shortcomings of previous designs, including limited bandwidth and program memory, lack of control over packet processing, and lack of support for timing synchronization. The new design of the module creates a robust serial-to-Ethernet conversion that is powered using the existing Ethernet cable. This innovation has a small form factor that allows it to power processors and transducers with minimal space requirements.
2015-06-01
5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Thermal gradient crystals as tuneable monochromator for high energy X-rays
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruett, U.; Schulte-Schrepping, H.; Heuer, J.
2010-06-23
At the high energy synchrotron radiation beamline BW5 at DORIS III at DESY a new monochromator providing broad energy bandwidth and high reflectivity is in use. On a small 10x10x5 mm{sup 3} silicon crystal scattering at the (311) reflection a thermal gradient is applied, which tunes the scattered energy bandwidth. The (311) reflection strongly suppresses the higher harmonics allowing the use of an image plate detector for crystallography. The monochromator can be used at photon energies above 60 keV.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hammond, William T.; Mudrick, John P.; Xue, Jiangeng, E-mail: jxue@mse.ufl.edu
2014-12-07
We present detailed studies of the high photocurrent gain behavior in multilayer organic photodiodes containing tailored carrier blocking layers we reported earlier in a Letter [W. T. Hammond and J. Xue, Appl. Phys. Lett. 97, 073302 (2010)], in which a high photocurrent gain of up to 500 was attributed to the accumulation of photogenerated holes at the anode/organic active layer interface and the subsequent drastic increase in secondary electron injection from the anode. Here, we show that both the hole-blocking layer structure and layer thickness strongly influence the magnitude of the photocurrent gain. Temporal studies revealed that the frequency responsemore » of such devices is limited by three different processes with lifetimes of 10 μs, 202 μs, and 2.72 ms for the removal of confined holes, which limit the 3 dB bandwidth of these devices to 1.4 kHz. Furthermore, the composition in the mixed organic donor-acceptor photoactive layer affects both gain and bandwidth, which is attributed to the varying charge transport characteristics, and the optimal gain-bandwidth product is achieved with approximately 30% donor content. Finally, these devices show a high dynamic range of more than seven orders of magnitude, although the photocurrent shows a sublinear dependence on the incident optical power.« less
Management of time-dependent multimedia data
NASA Astrophysics Data System (ADS)
Little, Thomas D.; Gibbon, John F.
1993-01-01
A number of approaches have been proposed for supporting high-bandwidth time-dependent multimedia data in a general purpose computing environment. Much of this work assumes the availability of ample resources such as CPU performance, bus, I/O, and communication bandwidth. However, many multimedia applications have large variations in instantaneous data presentation requirements (e.g., a dynamic range of order 100,000). By using a statistical scheduling approach these variations are effectively smoothed and, therefore, more applications are made viable. The result is a more efficient use of available bandwidth and the enabling of applications that have large short-term bandwidth requirements such as simultaneous video and still image retrieval. Statistical scheduling of multimedia traffic relies on accurate characterization or guarantee of channel bandwidth and delay. If guaranteed channel characteristics are not upheld due to spurious channel overload, buffer overflow and underflow can occur at the destination. The result is the loss of established source-destination synchronization and the introduction of intermedia skew. In this paper we present an overview of a proposed synchronization mechanism to limit the effects of such anomalous behavior. The proposed mechanism monitors buffer levels to detect impending low and high levels on frame basis and regulates the destination playout rate. Intermedia skew is controlled by a similar control algorithm. This mechanism is used in conjunction with a statistical source scheduling approach to provide an overall multimedia transmission and resynchronization system supporting graceful service degradation.
Amplifying modeling for broad bandwidth pulse in Nd:glass based on hybrid-broaden mechanism
NASA Astrophysics Data System (ADS)
Su, J.; Liu, L.; Luo, B.; Wang, W.; Jing, F.; Wei, X.; Zhang, X.
2008-05-01
In this paper, the cross relaxation time is proposed to combine the homogeneous and inhomogeneous broaden mechanism for broad bandwidth pulse amplification model. The corresponding velocity equation, which can describe the response of inverse population on upper and low energy level of gain media to different frequency of pulse, is also put forward. The gain saturation and energy relaxation effect are also included in the velocity equation. Code named CPAP has been developed to simulate the amplifying process of broad bandwidth pulse in multi-pass laser system. The amplifying capability of multi-pass laser system is evaluated and gain narrowing and temporal shape distortion are also investigated when bandwidth of pulse and cross relaxation time of gain media are different. Results can benefit the design of high-energy PW laser system in LFRC, CAEP.
Xie, Weilin; Xia, Zongyang; Zhou, Qian; Shi, Hongxiao; Dong, Yi; Hu, Weisheng
2015-07-13
We present a photonic approach for generating low phase noise, arbitrary chirped microwave waveforms based on heterodyne beating between high order correlated comb lines extracted from frequency-agile optical frequency comb. Using the dual heterodyne phase transfer scheme, extrinsic phase noises induced by the separate optical paths are efficiently suppressed by 42-dB at 1-Hz offset frequency. Linearly chirped microwave waveforms are achieved within 30-ms temporal duration, contributing to a large time-bandwidth product. The linearity measurement leads to less than 90 kHz RMS frequency error during the entire chirp duration, exhibiting excellent linearity for the microwave and sub-THz waveforms. The capability of generating arbitrary waveforms up to sub-THz band with flexible temporal duration, long repetition period, broad bandwidth, and large time-bandwidth product is investigated and discussed.
Yong, Y K; Moheimani, S O R; Kenton, B J; Leang, K K
2012-12-01
Recent interest in high-speed scanning probe microscopy for high-throughput applications including video-rate atomic force microscopy and probe-based nanofabrication has sparked attention on the development of high-bandwidth flexure-guided nanopositioning systems (nanopositioners). Such nanopositioners are designed to move samples with sub-nanometer resolution with positioning bandwidth in the kilohertz range. State-of-the-art designs incorporate uniquely designed flexure mechanisms driven by compact and stiff piezoelectric actuators. This paper surveys key advances in mechanical design and control of dynamic effects and nonlinearities, in the context of high-speed nanopositioning. Future challenges and research topics are also discussed.
High-speed 850 nm VCSELs with 28 GHz modulation bandwidth for short reach communication
NASA Astrophysics Data System (ADS)
Westbergh, Petter; Safaisini, Rashid; Haglund, Erik; Gustavsson, Johan S.; Larsson, Anders; Joel, Andrew
2013-03-01
We present results from our new generation of high performance 850 nm oxide confined vertical cavity surface-emitting lasers (VCSELs). With devices optimized for high-speed operation under direct modulation, we achieve record high 3dB modulation bandwidths of 28 GHz for ~4 μm oxide aperture diameter VCSELs, and 27 GHz for devices with a ~7 μm oxide aperture diameter. Combined with a high-speed photoreceiver, the ~7 μm VCSEL enables error-free transmission at data rates up to 47 Gbit/s at room temperature, and up to 40 Gbit/s at 85°C.
Studies of bandwidth dependence of laser plasma instabilities driven by the Nike laser
NASA Astrophysics Data System (ADS)
Weaver, J.; Kehne, D.; Obenschain, S.; Serlin, V.; Schmitt, A. J.; Oh, J.; Lehmberg, R. H.; Brown, C. M.; Seely, J.; Feldman, U.
2012-10-01
Experiments at the Nike laser facility of the Naval Research Laboratory are exploring the influence of laser bandwidth on laser plasma instabilities (LPI) driven by a deep ultraviolet pump (248 nm) that incorporates beam smoothing by induced spatial incoherence (ISI). In early ISI studies with longer wavelength Nd:glass lasers (1054 nm and 527 nm),footnotetextObenschain, PRL 62(1989);Mostovych, PRL 62(1987);Peyser, Phys. Fluids B 3(1991). stimulated Raman scattering, stimulated Brillouin scattering, and the two plasmon decay instability were reduced when wide bandwidth ISI (δν/ν˜0.03-0.19%) pulses irradiated targets at moderate to high intensities (10^14-10^15 W/cm^2). The current studies will compare the emission signatures of LPI from planar CH targets during Nike operation at large bandwidth (δν˜1THz) to observations for narrower bandwidth operation (δν˜0.1-0.3THz). These studies will help clarify the relative importance of the short wavelength and wide bandwidth to the increased LPI intensity thresholds observed at Nike. New pulse shapes are being used to generate plasmas with larger electron density scale-lengths that are closer to conditions during pellet implosions for direct drive inertial confinement fusion.
Ricketts, Todd A; Dittberner, Andrew B; Johnson, Earl E
2008-02-01
One factor that has been shown to greatly affect sound quality is audible bandwidth. Provision of gain for frequencies above 4-6 kHz has not generally been supported for groups of hearing aid wearers. The purpose of this study was to determine if preference for bandwidth extension in hearing aid processed sounds was related to the magnitude of hearing loss in individual listeners. Ten participants with normal hearing and 20 participants with mild-to-moderate hearing loss completed the study. Signals were processed using hearing aid-style compression algorithms and filtered using two cutoff frequencies, 5.5 and 9 kHz, which were selected to represent bandwidths that are achievable in modern hearing aids. Round-robin paired comparisons based on the criteria of preferred sound quality were made for 2 different monaurally presented brief sound segments, including music and a movie. Results revealed that preference for either the wider or narrower bandwidth (9- or 5.5-kHz cutoff frequency, respectively) was correlated with the slope of hearing loss from 4 to 12 kHz, with steep threshold slopes associated with preference for narrower bandwidths. Consistent preference for wider bandwidth is present in some listeners with mild-to-moderate hearing loss.
FPGA-Based Filterbank Implementation for Parallel Digital Signal Processing
NASA Technical Reports Server (NTRS)
Berner, Stephan; DeLeon, Phillip
1999-01-01
One approach to parallel digital signal processing decomposes a high bandwidth signal into multiple lower bandwidth (rate) signals by an analysis bank. After processing, the subband signals are recombined into a fullband output signal by a synthesis bank. This paper describes an implementation of the analysis and synthesis banks using (Field Programmable Gate Arrays) FPGAs.
Note: a transimpedance amplifier for remotely located quartz tuning forks.
Kleinbaum, Ethan; Csáthy, Gábor A
2012-12-01
The cable capacitance in cryogenic and high vacuum applications of quartz tuning forks imposes severe constraints on the bandwidth and noise performance of the measurement. We present a single stage low noise transimpedance amplifier with a bandwidth exceeding 1 MHz and provide an in-depth analysis of the dependence of the amplifier parameters on the cable capacitance.
VENI, video, VICI: The merging of computer and video technologies
NASA Technical Reports Server (NTRS)
Horowitz, Jay G.
1993-01-01
The topics covered include the following: High Definition Television (HDTV) milestones; visual information bandwidth; television frequency allocation and bandwidth; horizontal scanning; workstation RGB color domain; NTSC color domain; American HDTV time-table; HDTV image size; digital HDTV hierarchy; task force on digital image architecture; open architecture model; future displays; and the ULTIMATE imaging system.
Meese, Tim S; Holmes, David J
2010-10-01
Most contemporary models of spatial vision include a cross-oriented route to suppression (masking from a broadly tuned inhibitory pool), which is most potent at low spatial and high temporal frequencies (T. S. Meese & D. J. Holmes, 2007). The influence of this pathway can elevate orientation-masking functions without exciting the target mechanism, and because early psychophysical estimates of filter bandwidth did not accommodate this, it is likely that they have been overestimated for this corner of stimulus space. Here we show that a transient 40% contrast mask causes substantial binocular threshold elevation for a transient vertical target, and this declines from a mask orientation of 0° to about 40° (indicating tuning), and then more gently to 90°, where it remains at a factor of ∼4. We also confirm that cross-orientation masking is diminished or abolished at high spatial frequencies and for sustained temporal modulation. We fitted a simple model of pedestal masking and cross-orientation suppression (XOS) to our data and those of G. C. Phillips and H. R. Wilson (1984) and found the dependency of orientation bandwidth on spatial frequency to be much less than previously supposed. An extension of our linear spatial pooling model of contrast gain control and dilution masking (T. S. Meese & R. J. Summers, 2007) is also shown to be consistent with our results using filter bandwidths of ±20°. Both models include tightly and broadly tuned components of divisive suppression. More generally, because XOS and/or dilution masking can affect the shape of orientation-masking curves, we caution that variations in bandwidth estimates might reflect variations in processes that have nothing to do with filter bandwidth.
Bandwidth tunable THz wave generation in large-area periodically poled lithium niobate.
Zhang, Caihong; Avetisyan, Yuri; Glosser, Andreas; Kawayama, Iwao; Murakami, Hironaru; Tonouchi, Masayoshi
2012-04-09
A new scheme of optical rectification (OR) of femtosecond laser pulses in a periodically poled lithium niobate (PPLN) crystal, which generates high energy and bandwidth tunable multicycle THz pulses, is proposed and demonstrated. We show that the number of the oscillation cycles of the THz electric field and therefore bandwidth of generated THz spectrum can easily and smoothly be tuned from a few tens of GHz to a few THz by changing the pump optical spot size on PPLN crystal. The minimal bandwidth is 17 GHz that is smallest ever of reported in scheme of THz generation by OR at room temperature. Similar to the case of Cherenkov-type OR in single-domain LiNbO₃, the spectrum of THz generation extends from 0.1 THz to 3 THz when laser beam is focused to a size close to half-period of PPLN structure. The energy spectral density of narrowband THz generation is almost independent of the bandwidth and is typically 220 nJ/THz for ~1 W pump power at 1 kHz repetition rate.
A micromachined efficient parametric array loudspeaker with a wide radiation frequency band.
Je, Yub; Lee, Haksue; Been, Kyounghun; Moon, Wonkyu
2015-04-01
Parametric array (PA) loudspeakers generate directional audible sound via the PA effect, which can make private listening possible. The practical applications of PA loudspeakers include information technology devices that require large power efficiency transducers with a wide frequency bandwidth. Piezoelectric micromachined ultrasonic transducers (PMUTs) are compact and efficient units for PA sources [Je, Lee, and Moon, Ultrasonics 53, 1124-1134 (2013)]. This study investigated the use of an array of PMUTs to make a PA loudspeaker with high power efficiency and wide bandwidth. The achievable maximum radiation bandwidth of the driver was calculated, and an array of PMUTs with two distinct resonance frequencies (f1 = 100 kHz, f2 = 110 kHz) was designed. Out-of-phase driving was used with the dual-resonance transducer array to increase the bandwidth. The fabricated PMUT array exhibited an efficiency of up to 71%, together with a ±3-dB bandwidth of 17 kHz for directly radiated primary waves, and 19.5 kHz (500 Hz to 20 kHz) for the difference frequency waves (with equalization).
NASA Astrophysics Data System (ADS)
Kaba, M.; Zhou, F. C.; Lim, A.; Decoster, D.; Huignard, J.-P.; Tonda, S.; Dolfi, D.; Chazelas, J.
2007-11-01
The applications of microwave optoelectronics are extremely large since they extend from the Radio-over-Fibre to the Homeland security and defence systems. Then, the improved maturity of the optoelectronic components operating up to 40GHz permit to consider new optical processing functions (filtering, beamforming, ...) which can operate over very wideband microwave analogue signals. Specific performances are required which imply optical delay lines able to exhibit large Time-Bandwidth product values. It is proposed to evaluate slow light approach through highly dispersive structures based on either uniform or chirped Bragg Gratings. Therefore, we highlight the impact of the major parameters of such structures: index modulation depth, grating length, grating period, chirp coefficient and demonstrate the high potentiality of Bragg Grating for Large RF signals bandwidth processing under slow-light propagation.
A minimal SATA III Host Controller based on FPGA
NASA Astrophysics Data System (ADS)
Liu, Hailiang
2018-03-01
SATA (Serial Advanced Technology Attachment) is an advanced serial bus which has a outstanding performance in transmitting high speed real-time data applied in Personal Computers, Financial Industry, astronautics and aeronautics, etc. In this express, a minimal SATA III Host Controller based on Xilinx Kintex 7 serial FPGA is designed and implemented. Compared to the state-of-art, registers utilization are reduced 25.3% and LUTs utilization are reduced 65.9%. According to the experimental results, the controller works precisely and steady with the reading bandwidth of up to 536 MB per second and the writing bandwidth of up to 512 MB per second, both of which are close to the maximum bandwidth of the SSD(Solid State Disk) device. The host controller is very suitable for high speed data transmission and mass data storage.
NASA Astrophysics Data System (ADS)
Muhamad, Wan Asilah Wan; Ngah, Razali; Jamlos, Mohd Faizal; Soh, Ping Jack; Ali, Mohd Tarmizi
2017-01-01
A new dipole antenna designed using polydimethylsiloxane-glass microsphere (PDMS-GM) substrate is presented. The PDMS-GM substrate offered a lower permittivity of 1.85 compared to pure PDMS of 2.7. This resulted in a wide operating frequency range from 19 GHz up to more than 45 GHz, indicating a bandwidth of more than 28 GHz. The proposed PDMS-GM antenna featured a gain of up to 13.3 dB compared to pure PDMS which only produced 13 GHz of bandwidth and 5.5 dB gain. Instead of wide bandwidth and high gain, the proposed antenna is capable of becoming water resistant by covering its radiator and SMA connector. Such capabilities of the new PDMS-GM antenna indicated suitability for the fifth-generation (5G) wireless communication systems.
High-frequency chaotic dynamics enabled by optical phase-conjugation
Mercier, Émeric; Wolfersberger, Delphine; Sciamanna, Marc
2016-01-01
Wideband chaos is of interest for applications such as random number generation or encrypted communications, which typically use optical feedback in a semiconductor laser. Here, we show that replacing conventional optical feedback with phase-conjugate feedback improves the chaos bandwidth. In the range of achievable phase-conjugate mirror reflectivities, the bandwidth increase reaches 27% when compared with feedback from a conventional mirror. Experimental measurements of the time-resolved frequency dynamics on nanosecond time-scales show that the bandwidth enhancement is related to the onset of self-pulsing solutions at harmonics of the external-cavity frequency. In the observed regime, the system follows a chaotic itinerancy among these destabilized high-frequency external-cavity modes. The recorded features are unique to phase-conjugate feedback and distinguish it from the long-standing problem of time-delayed feedback dynamics. PMID:26739806
NASA Astrophysics Data System (ADS)
Cook, G. G.; Khamas, S. K.; Kingsley, S. P.; Woods, R. C.
1992-01-01
The radar cross section and Q factors of electrically small dipole and loop antennas made with a YBCO high Tc superconductor are predicted using a two-fluid-moment method model, in order to determine the effects of finite conductivity on the performances of such antennas. The results compare the useful operating bandwidths of YBCO antennas exhibiting varying degrees of impurity with their copper counterparts at 77 K, showing a linear relationship between bandwidth and impurity level.
Easwar, Vijayalakshmi; Purcell, David W; Aiken, Steven J; Parsa, Vijay; Scollie, Susan D
2015-01-01
The use of auditory evoked potentials as an objective outcome measure in infants fitted with hearing aids has gained interest in recent years. This article proposes a test paradigm using speech-evoked envelope following responses (EFRs) for use as an objective-aided outcome measure. The method uses a running speech-like, naturally spoken stimulus token /susa∫i/ (fundamental frequency [f0] = 98 Hz; duration 2.05 sec), to elicit EFRs by eight carriers representing low, mid, and high frequencies. Each vowel elicited two EFRs simultaneously, one from the region of formant one (F1) and one from the higher formants region (F2+). The simultaneous recording of two EFRs was enabled by lowering f0 in the region of F1 alone. Fricatives were amplitude modulated to enable recording of EFRs from high-frequency spectral regions. The present study aimed to evaluate the effect of level and bandwidth on speech-evoked EFRs in adults with normal hearing. As well, the study aimed to test convergent validity of the EFR paradigm by comparing it with changes in behavioral tasks due to bandwidth. Single-channel electroencephalogram was recorded from the vertex to the nape of the neck over 300 sweeps in two polarities from 20 young adults with normal hearing. To evaluate the effects of level in experiment I, EFRs were recorded at test levels of 50 and 65 dB SPL. To evaluate the effects of bandwidth in experiment II, EFRs were elicited by /susa∫i/ low-pass filtered at 1, 2, and 4 kHz, presented at 65 dB SPL. The 65 dB SPL condition from experiment I represented the full bandwidth condition. EFRs were averaged across the two polarities and estimated using a Fourier analyzer. An F test was used to determine whether an EFR was detected. Speech discrimination using the University of Western Ontario Distinctive Feature Differences test and sound quality rating using the Multiple Stimulus Hidden Reference and Anchors paradigm were measured in identical bandwidth conditions. In experiment I, the increase in level resulted in a significant increase in response amplitudes for all eight carriers (mean increase of 14 to 50 nV) and the number of detections (mean increase of 1.4 detections). In experiment II, an increase in bandwidth resulted in a significant increase in the number of EFRs detected until the low-pass filtered 4 kHz condition and carrier-specific changes in response amplitude until the full bandwidth condition. Scores in both behavioral tasks increased with bandwidth up to the full bandwidth condition. The number of detections and composite amplitude (sum of all eight EFR amplitudes) significantly correlated with changes in behavioral test scores. Results suggest that the EFR paradigm is sensitive to changes in level and audible bandwidth. This may be a useful tool as an objective-aided outcome measure considering its running speech-like stimulus, representation of spectral regions important for speech understanding, level and bandwidth sensitivity, and clinically feasible test times. This paradigm requires further validation in individuals with hearing loss, with and without hearing aids.
Elliptic Curve Cryptography with Security System in Wireless Sensor Networks
NASA Astrophysics Data System (ADS)
Huang, Xu; Sharma, Dharmendra
2010-10-01
The rapid progress of wireless communications and embedded micro-electro-system technologies has made wireless sensor networks (WSN) very popular and even become part of our daily life. WSNs design are generally application driven, namely a particular application's requirements will determine how the network behaves. However, the natures of WSN have attracted increasing attention in recent years due to its linear scalability, a small software footprint, low hardware implementation cost, low bandwidth requirement, and high device performance. It is noted that today's software applications are mainly characterized by their component-based structures which are usually heterogeneous and distributed, including the WSNs. But WSNs typically need to configure themselves automatically and support as hoc routing. Agent technology provides a method for handling increasing software complexity and supporting rapid and accurate decision making. This paper based on our previous works [1, 2], three contributions have made, namely (a) fuzzy controller for dynamic slide window size to improve the performance of running ECC (b) first presented a hidden generation point for protection from man-in-the middle attack and (c) we first investigates multi-agent applying for key exchange together. Security systems have been drawing great attentions as cryptographic algorithms have gained popularity due to the natures that make them suitable for use in constrained environment such as mobile sensor information applications, where computing resources and power availability are limited. Elliptic curve cryptography (ECC) is one of high potential candidates for WSNs, which requires less computational power, communication bandwidth, and memory in comparison with other cryptosystem. For saving pre-computing storages recently there is a trend for the sensor networks that the sensor group leaders rather than sensors communicate to the end database, which highlighted the needs to prevent from the man-in-the middle attack. A designed a hidden generator point that offer a good protection from the man-in-the middle (MinM) attack which becomes one of major worries for the sensor's networks with multiagent system is also discussed.
High-resolution detection of Brownian motion for quantitative optical tweezers experiments.
Grimm, Matthias; Franosch, Thomas; Jeney, Sylvia
2012-08-01
We have developed an in situ method to calibrate optical tweezers experiments and simultaneously measure the size of the trapped particle or the viscosity of the surrounding fluid. The positional fluctuations of the trapped particle are recorded with a high-bandwidth photodetector. We compute the mean-square displacement, as well as the velocity autocorrelation function of the sphere, and compare it to the theory of Brownian motion including hydrodynamic memory effects. A careful measurement and analysis of the time scales characterizing the dynamics of the harmonically bound sphere fluctuating in a viscous medium directly yields all relevant parameters. Finally, we test the method for different optical trap strengths, with different bead sizes and in different fluids, and we find excellent agreement with the values provided by the manufacturers. The proposed approach overcomes the most commonly encountered limitations in precision when analyzing the power spectrum of position fluctuations in the region around the corner frequency. These low frequencies are usually prone to errors due to drift, limitations in the detection, and trap linearity as well as short acquisition times resulting in poor statistics. Furthermore, the strategy can be generalized to Brownian motion in more complex environments, provided the adequate theories are available.
On some Aitken-like acceleration of the Schwarz method
NASA Astrophysics Data System (ADS)
Garbey, M.; Tromeur-Dervout, D.
2002-12-01
In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
Transfer of learning on a spatial memory task between the blind and sighted people.
Akpinar, Selcuk; Popović, Stevo; Kirazci, Sadettin
2012-12-01
The purpose of this study was to analyze the effect of two different types of feedback on a spatial memory task between the blind and blindfolded-sighted participants. Participants tried to estimate the predetermined distance by using their dominant hands. Both blind and blindfolded-sighted groups were randomly divided into two feedback subgroups as "100% frequency" and "10% bandwidth". The score of the participants was given verbally to the participants as knowledge of results (KR). The target distance was set as 60 cm. Sixty acquisition trials were performed in 4 sets each including 15 repetition afterwards immediate and delayed retention tests were undertaken. Moreover, 24 hours past the delayed retention test, the participants completed 15 no-KR trials as a transfer test (target distance was 30 cm). The results of the statistical analyses revealed no significant differences for both acquisition and retention tests. However, a significant difference was found at transfer test. 100% frequency blind group performed significantly less accurate than all other groups. As a result, it can be concluded that different types of feedback have similar effect on spatial memory task used in this study. However, types of feedback can change the performance of accuracy on transferring this skill among the blind.
Optimal Bandwidth for High Efficiency Thermoelectrics
NASA Astrophysics Data System (ADS)
Zhou, Jun; Yang, Ronggui; Chen, Gang; Dresselhaus, Mildred S.
2011-11-01
The thermoelectric figure of merit (ZT) in narrow conduction bands of different material dimensionalities is investigated for different carrier scattering models. When the bandwidth is zero, the transport distribution function (TDF) is finite, not infinite as previously speculated by Mahan and Sofo [Proc. Natl. Acad. Sci. U.S.A. 93, 7436 (1996)PNASA60027-842410.1073/pnas.93.15.7436], even though the carrier density of states goes to infinity. Such a finite TDF results in a zero electrical conductivity and thus a zero ZT. We point out that the optimal ZT cannot be found in an extremely narrow conduction band. The existence of an optimal bandwidth for a maximal ZT depends strongly on the scattering models and the dimensionality of the material. A nonzero optimal bandwidth for maximizing ZT also depends on the lattice thermal conductivity. A larger maximum ZT can be obtained for materials with a smaller lattice thermal conductivity.
Ultra-flat wideband single-pump Raman-enhanced parametric amplification.
Gordienko, V; Stephens, M F C; El-Taher, A E; Doran, N J
2017-03-06
We experimentally optimize a single pump fiber optical parametric amplifier in terms of gain spectral bandwidth and gain variation (GV). We find that optimal performance is achieved with the pump tuned to the zero-dispersion wavelength of dispersion stable highly nonlinear fiber (HNLF). We demonstrate further improvement of parametric gain bandwidth and GV by decreasing the HNLF length. We discover that Raman and parametric gain spectra produced by the same pump may be merged together to enhance overall gain bandwidth, while keeping GV low. Consequently, we report an ultra-flat gain of 9.6 ± 0.5 dB over a range of 111 nm (12.8 THz) on one side of the pump. Additionally, we demonstrate amplification of a 60 Gbit/s QPSK signal tuned over a portion of the available bandwidth with OSNR penalty less than 1 dB for Q2 below 14 dB.
pathChirp: Efficient Available Bandwidth Estimation for Network Paths
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cottrell, Les
2003-04-30
This paper presents pathChirp, a new active probing tool for estimating the available bandwidth on a communication network path. Based on the concept of ''self-induced congestion,'' pathChirp features an exponential flight pattern of probes we call a chirp. Packet chips offer several significant advantages over current probing schemes based on packet pairs or packet trains. By rapidly increasing the probing rate within each chirp, pathChirp obtains a rich set of information from which to dynamically estimate the available bandwidth. Since it uses only packet interarrival times for estimation, pathChirp does not require synchronous nor highly stable clocks at the sendermore » and receiver. We test pathChirp with simulations and Internet experiments and find that it provides good estimates of the available bandwidth while using only a fraction of the number of probe bytes that current state-of-the-art techniques use.« less
Simple piezoelectric-actuated mirror with 180 kHz servo bandwidth.
Briles, Travis C; Yost, Dylan C; Cingöz, Arman; Ye, Jun; Schibli, Thomas R
2010-05-10
We present a high bandwidth piezoelectric-actuated mirror for length stabilization of an optical cavity. The actuator displays a transfer function with a flat amplitude response and greater than 135 masculine phase margin up to 200 kHz, allowing a 180 kHz unity gain frequency to be achieved in a closed servo loop. To the best of our knowledge, this actuator has achieved the largest servo bandwidth for a piezoelectric transducer (PZT). The actuator should be very useful in a wide variety of applications requiring precision control of optical lengths, including laser frequency stabilization, optical interferometers, and optical communications. (c) 2010 Optical Society of America.
Wide bandwidth transimpedance amplifier for extremely high sensitivity continuous measurements.
Ferrari, Giorgio; Sampietro, Marco
2007-09-01
This article presents a wide bandwidth transimpedance amplifier based on the series of an integrator and a differentiator stage, having an additional feedback loop to discharge the standing current from the device under test (DUT) to ensure an unlimited measuring time opportunity when compared to switched discharge configurations while maintaining a large signal amplification over the full bandwidth. The amplifier shows a flat response from 0.6 Hz to 1.4 MHz, the capability to operate with leakage currents from the DUT as high as tens of nanoamperes, and rail-to-rail dynamic range for sinusoidal current signals independent of the DUT leakage current. Also available is a monitor output of the stationary current to track experimental slow drifts. The circuit is ideal for noise spectral and impedance measurements of nanodevices and biomolecules when in the presence of a physiological medium and in all cases where high sensitivity current measurements are requested such as in scanning probe microscopy systems.
Low-power, transparent optical network interface for high bandwidth off-chip interconnects.
Liboiron-Ladouceur, Odile; Wang, Howard; Garg, Ajay S; Bergman, Keren
2009-04-13
The recent emergence of multicore architectures and chip multiprocessors (CMPs) has accelerated the bandwidth requirements in high-performance processors for both on-chip and off-chip interconnects. For next generation computing clusters, the delivery of scalable power efficient off-chip communications to each compute node has emerged as a key bottleneck to realizing the full computational performance of these systems. The power dissipation is dominated by the off-chip interface and the necessity to drive high-speed signals over long distances. We present a scalable photonic network interface approach that fully exploits the bandwidth capacity offered by optical interconnects while offering significant power savings over traditional E/O and O/E approaches. The power-efficient interface optically aggregates electronic serial data streams into a multiple WDM channel packet structure at time-of-flight latencies. We demonstrate a scalable optical network interface with 70% improvement in power efficiency for a complete end-to-end PCI Express data transfer.
NASA Astrophysics Data System (ADS)
Xie, Yiwei; Geng, Zihan; Zhuang, Leimeng; Burla, Maurizio; Taddei, Caterina; Hoekman, Marcel; Leinse, Arne; Roeloffzen, Chris G. H.; Boller, Klaus-J.; Lowery, Arthur J.
2017-12-01
Integrated optical signal processors have been identified as a powerful engine for optical processing of microwave signals. They enable wideband and stable signal processing operations on miniaturized chips with ultimate control precision. As a promising application, such processors enables photonic implementations of reconfigurable radio frequency (RF) filters with wide design flexibility, large bandwidth, and high-frequency selectivity. This is a key technology for photonic-assisted RF front ends that opens a path to overcoming the bandwidth limitation of current digital electronics. Here, the recent progress of integrated optical signal processors for implementing such RF filters is reviewed. We highlight the use of a low-loss, high-index-contrast stoichiometric silicon nitride waveguide which promises to serve as a practical material platform for realizing high-performance optical signal processors and points toward photonic RF filters with digital signal processing (DSP)-level flexibility, hundreds-GHz bandwidth, MHz-band frequency selectivity, and full system integration on a chip scale.
Closed-loop control of gimbal-less MEMS mirrors for increased bandwidth in LiDAR applications
NASA Astrophysics Data System (ADS)
Milanović, Veljko; Kasturi, Abhishek; Yang, James; Hu, Frank
2017-05-01
In 2016, we presented a low SWaP wirelessly controlled MEMS mirror-based LiDAR prototype which utilized an OEM laser rangefinder for distance measurement [1]. The MEMS mirror was run in open loop based on its exceptionally fast design and high repeatability performance. However, to further extend the bandwidth and incorporate necessary eyesafety features, we recently focused on providing mirror position feedback and running the system in closed loop control. Multiple configurations of optical position sensors, mounted on both the front- and the back-side of the MEMS mirror, have been developed and will be presented. In all cases, they include a light source (LED or laser) and a 2D photosensor. The most compact version is mounted on the backside of the MEMS mirror ceramic package and can "view" the mirror's backside through openings in the mirror's PCB and its ceramic carrier. This version increases the overall size of the MEMS mirror submodule from 12mm x 12mm x 4mm to 15mm x 15mm x 7mm. The sensors also include optical and electronic filtering to reduce effects of any interference from the application laser illumination. With relatively simple FPGA-based PID control running at the sample rate of 100 kHz, we could configure the overall response of the system to fully utilize the MEMS mirror's native bandwidth which extends well beyond its first resonance. When compared to the simple open loop method of suppressing overshoot and ringing which significantly limits bandwidth utilization, running the mirrors in closed loop control increased the bandwidth to nearly 3.7 times. A 2.0mm diameter integrated MEMS mirror with a resonant frequency of 1300 Hz was limited to 500Hz bandwidth in open loop driving but was increased to 3kHz bandwidth with the closed loop controller. With that bandwidth it is capable of very sharply defined uniform-velocity scans (sawtooth or triangle waveforms) which are highly desired in scanned mirror LiDAR systems. A 2.4mm diameter mirror with +/-12° of scan angle achieves over 1.3kHz of flat response, allowing sharp triangle waveforms even at 300Hz (600 uniform velocity lines per second). The same methodology is demonstrated with larger, bonded mirrors. Here closed loop control is more challenging due to the additional resonance and a more complex system dynamic. Nevertheless, results are similar - a 5mm diameter mirror bandwidth was increased from 150Hz to 500Hz.
Developments of capacitance stabilised etalon technology
NASA Astrophysics Data System (ADS)
Bond, R. A.; Foster, M.; Thwaite, C.; Thompson, C. K.; Rees, D.; Bakalski, I. V.; Pereira do Carmo, J.
2017-11-01
This paper describes a high-resolution optical filter (HRF) suitable for narrow bandwidth filtering in LIDAR applications. The filter is composed of a broadband interference filter and a narrowband Fabry-Perot etalon based on the capacitance stabilised concept. The key requirements for the HRF were a bandwidth of less than 40 pm, a tuneable range of over 6 nm and a transmission greater than 50%. These requirements combined with the need for very high out-of-band rejection (greater than 50 dB in the range 300 nm to 1200 nm) drive the design of the filter towards a combination of high transmission broadband filter and high performance tuneable, narrowband filter.
Controlling the spectral shape of nonlinear Thomson scattering with proper laser chirping
Rykovanov, S. G.; Geddes, C. G. R.; Schroeder, C. B.; ...
2016-03-18
Effects of nonlinearity in Thomson scattering of a high intensity laser pulse from electrons are analyzed. Analytic expressions for laser pulse shaping in frequency (chirping) are obtained which control spectrum broadening for high laser pulse intensities. These analytic solutions allow prediction of the spectral form and required laser parameters to avoid broadening. Results of analytical and numerical calculations agree well. The control over the scattered radiation bandwidth allows narrow bandwidth sources to be produced using high scattering intensities, which in turn greatly improves scattering yield for future x- and gamma-ray sources.
High-speed electronic beam steering using injection locking of a laser-diode array
NASA Astrophysics Data System (ADS)
Swanson, E. A.; Abbas, G. L.; Yang, S.; Chan, V. W. S.; Fujimoto, J. G.
1987-01-01
High-speed electronic steering of the output beam of a 10-stripe laser-diode array is reported. The array was injection locked to a single-frequency laser diode. High-speed steering of the locked 0.5-deg-wide far-field lobe is demonstrated either by modulating the injection current of the array or by modulating the frequency of the master laser. Closed-loop tracking bandwidths of 70 kHz and 3 MHz, respectively, were obtained. The beam-steering bandwidths are limited by the FM responses of the modulated devices for both techniques.
Breaking Lorentz reciprocity to overcome the time-bandwidth limit in physics and engineering
NASA Astrophysics Data System (ADS)
Tsakmakidis, K. L.; Shen, L.; Schulz, S. A.; Zheng, X.; Upham, J.; Deng, X.; Altug, H.; Vakakis, A. F.; Boyd, R. W.
2017-06-01
A century-old tenet in physics and engineering asserts that any type of system, having bandwidth Δω, can interact with a wave over only a constrained time period Δt inversely proportional to the bandwidth (Δt·Δω ~ 2π). This law severely limits the generic capabilities of all types of resonant and wave-guiding systems in photonics, cavity quantum electrodynamics and optomechanics, acoustics, continuum mechanics, and atomic and optical physics but is thought to be completely fundamental, arising from basic Fourier reciprocity. We propose that this “fundamental” limit can be overcome in systems where Lorentz reciprocity is broken. As a system becomes more asymmetric in its transport properties, the degree to which the limit can be surpassed becomes greater. By way of example, we theoretically demonstrate how, in an astutely designed magnetized semiconductor heterostructure, the above limit can be exceeded by orders of magnitude by using realistic material parameters. Our findings revise prevailing paradigms for linear, time-invariant resonant systems, challenging the doctrine that high-quality resonances must invariably be narrowband and providing the possibility of developing devices with unprecedentedly high time-bandwidth performance.
Managing high-bandwidth real-time data storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bigelow, David D.; Brandt, Scott A; Bent, John M
2009-09-23
There exist certain systems which generate real-time data at high bandwidth, but do not necessarily require the long-term retention of that data in normal conditions. In some cases, the data may not actually be useful, and in others, there may be too much data to permanently retain in long-term storage whether it is useful or not. However, certain portions of the data may be identified as being vitally important from time to time, and must therefore be retained for further analysis or permanent storage without interrupting the ongoing collection of new data. We have developed a system, Mahanaxar, intended tomore » address this problem. It provides quality of service guarantees for incoming real-time data streams and simultaneous access to already-recorded data on a best-effort basis utilizing any spare bandwidth. It has built in mechanisms for reliability and indexing, can scale upwards to meet increasing bandwidth requirements, and handles both small and large data elements equally well. We will show that a prototype version of this system provides better performance than a flat file (traditional filesystem) based version, particularly with regard to quality of service guarantees and hard real-time requirements.« less
Bandwidth efficient coding for satellite communications
NASA Technical Reports Server (NTRS)
Lin, Shu; Costello, Daniel J., Jr.; Miller, Warner H.; Morakis, James C.; Poland, William B., Jr.
1992-01-01
An error control coding scheme was devised to achieve large coding gain and high reliability by using coded modulation with reduced decoding complexity. To achieve a 3 to 5 dB coding gain and moderate reliability, the decoding complexity is quite modest. In fact, to achieve a 3 dB coding gain, the decoding complexity is quite simple, no matter whether trellis coded modulation or block coded modulation is used. However, to achieve coding gains exceeding 5 dB, the decoding complexity increases drastically, and the implementation of the decoder becomes very expensive and unpractical. The use is proposed of coded modulation in conjunction with concatenated (or cascaded) coding. A good short bandwidth efficient modulation code is used as the inner code and relatively powerful Reed-Solomon code is used as the outer code. With properly chosen inner and outer codes, a concatenated coded modulation scheme not only can achieve large coding gains and high reliability with good bandwidth efficiency but also can be practically implemented. This combination of coded modulation and concatenated coding really offers a way of achieving the best of three worlds, reliability and coding gain, bandwidth efficiency, and decoding complexity.
Chatrath, Jatin; Aziz, Mohsin; Helaoui, Mohamed
2018-01-01
Reconfigurable and multi-standard RF front-ends for wireless communication and sensor networks have gained importance as building blocks for the Internet of Things. Simpler and highly-efficient transmitter architectures, which can transmit better quality signals with reduced impairments, are an important step in this direction. In this regard, mixer-less transmitter architecture, namely, the three-way amplitude modulator-based transmitter, avoids the use of imperfect mixers and frequency up-converters, and their resulting distortions, leading to an improved signal quality. In this work, an augmented memory polynomial-based model for the behavioral modeling of such mixer-less transmitter architecture is proposed. Extensive simulations and measurements have been carried out in order to validate the accuracy of the proposed modeling strategy. The performance of the proposed model is evaluated using normalized mean square error (NMSE) for long-term evolution (LTE) signals. NMSE for a LTE signal of 1.4 MHz bandwidth with 100,000 samples for digital combining and analog combining are recorded as −36.41 dB and −36.9 dB, respectively. Similarly, for a 5 MHz signal the proposed models achieves −31.93 dB and −32.08 dB NMSE using digital and analog combining, respectively. For further validation of the proposed model, amplitude-to-amplitude (AM-AM), amplitude-to-phase (AM-PM), and the spectral response of the modeled and measured data are plotted, reasonably meeting the desired modeling criteria. PMID:29510501
Enhanced speed in fluorescence imaging using beat frequency multiplexing
NASA Astrophysics Data System (ADS)
Mikami, Hideharu; Kobayashi, Hirofumi; Wang, Yisen; Hamad, Syed; Ozeki, Yasuyuki; Goda, Keisuke
2016-03-01
Fluorescence imaging using radiofrequency-tagged emission (FIRE) is an emerging technique that enables higher imaging speed (namely, temporal resolution) in fluorescence microscopy compared to conventional fluorescence imaging techniques such as confocal microscopy and wide-field microscopy. It works based on the principle that it uses multiple intensity-modulated fields in an interferometric setup as excitation fields and applies frequency-division multiplexing to fluorescence signals. Unfortunately, despite its high potential, FIRE has limited imaging speed due to two practical limitations: signal bandwidth and signal detection efficiency. The signal bandwidth is limited by that of an acousto-optic deflector (AOD) employed in the setup, which is typically 100-200 MHz for the spectral range of fluorescence excitation (400-600 nm). The signal detection efficiency is limited by poor spatial mode-matching between two interfering fields to produce a modulated excitation field. Here we present a method to overcome these limitations and thus to achieve higher imaging speed than the prior version of FIRE. Our method achieves an increase in signal bandwidth by a factor of two and nearly optimal mode matching, which enables the imaging speed limited by the lifetime of the target fluorophore rather than the imaging system itself. The higher bandwidth and better signal detection efficiency work synergistically because higher bandwidth requires higher signal levels to avoid the contribution of shot noise and amplifier noise to the fluorescence signal. Due to its unprecedentedly high-speed performance, our method has a wide variety of applications in cancer detection, drug discovery, and regenerative medicine.
NASA Technical Reports Server (NTRS)
Richard, Mark A.
1993-01-01
The recent discovery of high temperature superconductors (HTS) has generated a substantial amount of interest in microstrip antenna applications. However, the high permittivity of substrates compatible with HTS results in narrow bandwidths and high patch edge impedances of such antennas. To investigate the performance of superconducting microstrip antennas, three antenna architectures at K and Ka-band frequencies are examined. Superconducting microstrip antennas that are directly coupled, gap coupled, and electromagnetically coupled to a microstrip transmission line were designed and fabricated on lanthanum aluminate substrates using YBa2Cu3O7 superconducting thin films. For each architecture, a single patch antenna and a four element array were fabricated. Measurements from these antennas, including input impedance, bandwidth, patterns, efficiency, and gain are presented. The measured results show usable antennas can be constructed using any of the architectures. All architectures show excellent gain characteristics, with less than 2 dB of total loss in the four element arrays. Although the direct and gap coupled antennas are the simplest antennas to design and fabricate, they suffer from narrow bandwidths. The electromagnetically coupled antenna, on the other hand, allows the flexibility of using a low permittivity substrate for the patch radiator, while using HTS for the feed network, thus increasing the bandwidth while effectively utilizing the low loss properties of HTS. Each antenna investigated in this research is the first of its kind reported.
NASA Astrophysics Data System (ADS)
Popmintchev, Dimitar; Galloway, Benjamin R.; Chen, Ming-Chang; Dollar, Franklin; Mancuso, Christopher A.; Hankla, Amelia; Miaja-Avila, Luis; O'Neil, Galen; Shaw, Justin M.; Fan, Guangyu; Ališauskas, Skirmantas; Andriukaitis, Giedrius; Balčiunas, Tadas; Mücke, Oliver D.; Pugzlys, Audrius; Baltuška, Andrius; Kapteyn, Henry C.; Popmintchev, Tenio; Murnane, Margaret M.
2018-03-01
Recent advances in high-order harmonic generation have made it possible to use a tabletop-scale setup to produce spatially and temporally coherent beams of light with bandwidth spanning 12 octaves, from the ultraviolet up to x-ray photon energies >1.6 keV . Here we demonstrate the use of this light for x-ray-absorption spectroscopy at the K - and L -absorption edges of solids at photon energies near 1 keV. We also report x-ray-absorption spectroscopy in the water window spectral region (284-543 eV) using a high flux high-order harmonic generation x-ray supercontinuum with 109 photons/s in 1% bandwidth, 3 orders of magnitude larger than has previously been possible using tabletop sources. Since this x-ray radiation emerges as a single attosecond-to-femtosecond pulse with peak brightness exceeding 1026 photons/s /mrad2/mm2/1 % bandwidth, these novel coherent x-ray sources are ideal for probing the fastest molecular and materials processes on femtosecond-to-attosecond time scales and picometer length scales.
Photogating in Low Dimensional Photodetectors
Fang, Hehai
2017-01-01
Abstract Low dimensional materials including quantum dots, nanowires, 2D materials, and so forth have attracted increasing research interests for electronic and optoelectronic devices in recent years. Photogating, which is usually observed in photodetectors based on low dimensional materials and their hybrid structures, is demonstrated to play an important role. Photogating is considered as a way of conductance modulation through photoinduced gate voltage instead of simply and totally attributing it to trap states. This review first focuses on the gain of photogating and reveals the distinction from conventional photoconductive effect. The trap‐ and hybrid‐induced photogating including their origins, formations, and characteristics are subsequently discussed. Then, the recent progress on trap‐ and hybrid‐induced photogating in low dimensional photodetectors is elaborated. Though a high gain bandwidth product as high as 109 Hz is reported in several cases, a trade‐off between gain and bandwidth has to be made for this type of photogating. The general photogating is put forward according to another three reported studies very recently. General photogating may enable simultaneous high gain and high bandwidth, paving the way to explore novel high‐performance photodetectors. PMID:29270342
Acoustic communications for cabled seafloor observatories
NASA Astrophysics Data System (ADS)
Freitag, L.; Stojanovic, M.
2003-04-01
Cabled seafloor observatories will provide scientists with a continuous presence in both deep and shallow water. In the deep ocean, connecting sensors to seafloor nodes for power and data transfer will require cables and a highly-capable ROV, both of which are potentially expensive. For many applications where very high bandwidth is not required, and where a sensor is already designed to operate on battery power, the use of acoustic links should be considered. Acoustic links are particularly useful for large numbers of low-bandwidth sensors scattered over tens of square kilometers. Sensors used to monitor the chemistry and biology of vent fields are one example. Another important use for acoustic communication is monitoring of AUVs performing pre-programmed or adaptive sampling missions. A high data rate acoustic link with an AUV allows the observer on shore to direct the vehicle in real-time, providing for dynamic event response. Thus both fixed and mobile sensors motivate the development of observatory infrastructure that provides power-efficient, high bandwidth acoustic communication. A proposed system design that can provide the wireless infrastructure, and further examples of its use in networks such as NEPTUNE, are presented.
Final report for the Tera Computer TTI CRADA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davidson, G.S.; Pavlakos, C.; Silva, C.
1997-01-01
Tera Computer and Sandia National Laboratories have completed a CRADA, which examined the Tera Multi-Threaded Architecture (MTA) for use with large codes of importance to industry and DOE. The MTA is an innovative architecture that uses parallelism to mask latency between memories and processors. The physical implementation is a parallel computer with high cross-section bandwidth and GaAs processors designed by Tera, which support many small computation threads and fast, lightweight context switches between them. When any thread blocks while waiting for memory accesses to complete, another thread immediately begins execution so that high CPU utilization is maintained. The Tera MTAmore » parallel computer has a single, global address space, which is appealing when porting existing applications to a parallel computer. This ease of porting is further enabled by compiler technology that helps break computations into parallel threads. DOE and Sandia National Laboratories were interested in working with Tera to further develop this computing concept. While Tera Computer would continue the hardware development and compiler research, Sandia National Laboratories would work with Tera to ensure that their compilers worked well with important Sandia codes, most particularly CTH, a shock physics code used for weapon safety computations. In addition to that important code, Sandia National Laboratories would complete research on a robotic path planning code, SANDROS, which is important in manufacturing applications, and would evaluate the MTA performance on this code. Finally, Sandia would work directly with Tera to develop 3D visualization codes, which would be appropriate for use with the MTA. Each of these tasks has been completed to the extent possible, given that Tera has just completed the MTA hardware. All of the CRADA work had to be done on simulators.« less
Millimetron and Earth-Space VLBI
NASA Astrophysics Data System (ADS)
Likhachev, S.
2014-01-01
The main scientific goal of the Millimetron mission operating in Space VLBI (SVLBI) mode will be the exploration of compact radio sources with extremely high angular resolution (better than one microsecond of arc). The space-ground interferometer Millimetron has an orbit around L2 point of the Earth - Sun system and allows operating with baselines up to a hundred Earth diameters. SVLBI observations will be accomplished by space and ground-based radio telescopes simultaneously. At the space telescope the received baseband signal is digitized and then transferred to the onboard memory storage (up to 100TB). The scientific and service data transfer to the ground tracking station is performed by means of both synchronization and communication radio links (1 GBps). Then the array of the scientific data is processed at the correlation center. Due to the (u,v) - plane coverage requirements for SVLBI imaging, it is necessary to propose observations at two different frequencies and two circular polarizations simultaneously with frequency switching. The total recording bandwidth (2x2x4 GHz) defines of the on-board memory size. The ground based support of the Millimetron mission in the VLBI-mode could be Atacama Large Millimeter Array (ALMA), Pico Valletta (Spain), Plateau de Bure interferometer (France), SMT telescope in the US (Arizona), LMT antenna (Mexico), SMA array, (Mauna Kea, USA), as well as the Green Bank and Effelsberg 100 m telescopes (for 22 GHz observations). We will present simulation results for Millimetron-ALMA interferometer. The sensitivity estimate of the space-ground interferometer will be compared to the requirements of the scientific goals of the mission. The possibility of multi-frequency synthesis (MFS) to obtain high quality images will also be considered.
EMG-Torque Dynamics Change With Contraction Bandwidth.
Golkar, Mahsa A; Jalaleddini, Kian; Kearney, Robert E
2018-04-01
An accurate model for ElectroMyoGram (EMG)-torque dynamics has many uses. One of its applications which has gained high attention among researchers is its use, in estimating the muscle contraction level for the efficient control of prosthesis. In this paper, the dynamic relationship between the surface EMG and torque during isometric contractions at the human ankle was studied using system identification techniques. Subjects voluntarily modulated their ankle torque in dorsiflexion direction, by activating their tibialis anterior muscle, while tracking a pseudo-random binary sequence in a torque matching task. The effects of contraction bandwidth, described by torque spectrum, on EMG-torque dynamics were evaluated by varying the visual command switching time. Nonparametric impulse response functions (IRF) were estimated between the processed surface EMG and torque. It was demonstrated that: 1) at low contraction bandwidths, the identified IRFs had unphysiological anticipatory (i.e., non-causal) components, whose amplitude decreased as the contraction bandwidth increased. We hypothesized that this non-causal behavior arose, because the EMG input contained a component due to feedback from the output torque, i.e., it was recorded from within a closed-loop. Vision was not the feedback source since the non-causal behavior persisted when visual feedback was removed. Repeating the identification using a nonparametric closed-loop identification algorithm yielded causal IRFs at all bandwidths, supporting this hypothesis. 2) EMG-torque dynamics became faster and the bandwidth of system increased as contraction modulation rate increased. Thus, accurate prediction of torque from EMG signals must take into account the contraction bandwidth sensitivity of this system.
Large dynamic range terahertz spectrometers based on plasmonic photomixers (Conference Presentation)
NASA Astrophysics Data System (ADS)
Wang, Ning; Javadi, Hamid; Jarrahi, Mona
2017-02-01
Heterodyne terahertz spectrometers are highly in demand for space explorations and astrophysics studies. A conventional heterodyne terahertz spectrometer consists of a terahertz mixer that mixes a received terahertz signal with a local oscillator signal to generate an intermediate frequency signal in the radio frequency (RF) range, where it can be easily processed and detected by RF electronics. Schottky diode mixers, superconductor-insulator-superconductor (SIS) mixers and hot electron bolometer (HEB) mixers are the most commonly used mixers in conventional heterodyne terahertz spectrometers. While conventional heterodyne terahertz spectrometers offer high spectral resolution and high detection sensitivity levels at cryogenic temperatures, their dynamic range and bandwidth are limited by the low radiation power of existing terahertz local oscillators and narrow bandwidth of existing terahertz mixers. To address these limitations, we present a novel approach for heterodyne terahertz spectrometry based on plasmonic photomixing. The presented design replaces terahertz mixer and local oscillator of conventional heterodyne terahertz spectrometers with a plasmonic photomixer pumped by an optical local oscillator. The optical local oscillator consists of two wavelength-tunable continuous-wave optical sources with a terahertz frequency difference. As a result, the spectrometry bandwidth and dynamic range of the presented heterodyne spectrometer is not limited by radiation frequency and power restrictions of conventional terahertz sources. We demonstrate a proof-of-concept terahertz spectrometer with more than 90 dB dynamic range and 1 THz spectrometry bandwidth.
NASA Astrophysics Data System (ADS)
Elgamri, Abdelghafor
The increased demand from IP traffic, video application and cell backhaul has placed fiber routes under severe stains. The high demands for large bandwidth from enormous numbers from cell sites on a network made the capacity of yesterday's networks not adequate for today's bandwidth demand. Carries considered Dense Wavelength Division Multiplexing (DWDM) network to overcome this issue. Recently, there has been growing interest in fiber Raman amplifiers due to their capability to upgrade the wavelength-division-multiplexing bandwidth, arbitrary gain bandwidth. In addition, photonic crystal fibers have been widely modeled, studied, and fabricated due to their peculiar properties that cannot be achieved with conventional fibers. The focus of this thesis is to develop a low-noise broadband Raman amplification system based on photonic crystal Fiber that can be implemented in high capacity DWDM network successfully. The design a module of photonic crystal fiber Raman amplifier is based on the knowledge of the fiber cross-sectional characteristics i.e. the geometric parameters and the Germania concentration in the dope area. The module allows to study different air-hole dimension and disposition, with or without a central doped area. In addition the design integrates distributed Raman amplifier and nonlinear optical loop mirror to improve the signal to noise ratio and overall gain in large capacity DWDM networks.
Rohani, Ali; Varhue, Walter; Su, Yi-Hsuan; Swami, Nathan S
2014-07-01
Electrorotation (ROT) is a powerful tool for characterizing the dielectric properties of cells and bioparticles. However, its application has been somewhat limited by the need to mitigate disruptions to particle rotation by translation under positive DEP and by frictional interactions with the substrate. While these disruptions may be overcome by implementing particle positioning schemes or field cages, these methods restrict the frequency bandwidth to the negative DEP range and permit only single particle measurements within a limited spatial extent of the device geometry away from field nonuniformities. Herein, we present an electrical tweezer methodology based on a sequence of electrical signals, composed of negative DEP using 180-degree phase-shifted fields for trapping and levitation of the particles, followed by 90-degree phase-shifted fields over a wide frequency bandwidth for highly parallelized electrorotation measurements. Through field simulations of the rotating electrical field under this wave-sequence, we illustrate the enhanced spatial extent for electrorotation measurements, with no limitations to frequency bandwidth. We apply this methodology to characterize subtle modifications in morphology and electrophysiology of Cryptosporidium parvum with varying degrees of heat treatment, in terms of shifts in the electrorotation spectra over the 0.05-40 MHz region. Given the single particle sensitivity and the ability for highly parallelized electrorotation measurements, we envision its application toward characterizing heterogeneous subpopulations of microbial and stem cells. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Investigation of high-speed Si photodetectors in standard CMOS technology
NASA Astrophysics Data System (ADS)
Wang, Huaqiang; Guo, Xia
2018-05-01
In this paper, the frequency response characteristics of the photodetector(PD) were studied considering intrinsic and extrinsic effects. Then we designed the interdigitated p-i-n PD on Silicon-on-Insulator (SOI) and epitaxial (EPI) substrates with photosensitive area of 30-μm diameter, fabricated by CMOS process. The 2-μm finger-spacing devices exhibited a 205 MHz bandwidth at a reverse bias of 3 V processed on 2-μm SOI substrates. EPI devices with 1 μm finger spacing exhibited a 131 MHz bandwidth under -3 V. Responsivity of 0.051 A/W and 0.21 A/W were measured at 850 nm on SOI and EPI substrates, respectively. Compared with the bulk silicon PD, the bandwidth is greatly improved. The PD gains the high cost performance ratio, which can be widely used in short distance communication such as visible light communication and free space optical communication.
Bandwidth tunable amplifier for recording biopotential signals.
Hwang, Sungkil; Aninakwa, Kofi; Sonkusale, Sameer
2010-01-01
This paper presents a low noise, low power, bandwidth tunable amplifier for bio-potential signal recording applications. By employing depletion-mode pMOS transistor in diode configuration as a tunable sub pA current source to adjust the resistivity of MOS-Bipolar pseudo-resistor, the bandwidth is adjusted without any need for a separate band-pass filter stage. For high CMRR, PSRR and dynamic range, a fully differential structure is used in the design of the amplifier. The amplifier achieves a midband gain of 39.8dB with a tunable high-pass cutoff frequency ranging from 0.1Hz to 300Hz. The amplifier is fabricated in 0.18εm CMOS process and occupies 0.14mm(2) of chip area. A three electrode ECG measurement is performed using the proposed amplifier to show its feasibility for low power, compact wearable ECG monitoring application.
NASA Astrophysics Data System (ADS)
Khokhlova, V. A.; Bessonova, O. V.; Soneson, J. E.; Canney, M. S.; Bailey, M. R.; Crum, L. A.
2010-03-01
Nonlinear propagation effects result in the formation of weak shocks in high intensity focused ultrasound (HIFU) fields. When shocks are present, the wave spectrum consists of hundreds of harmonics. In practice, shock waves are modeled using a finite number of harmonics and measured with hydrophones that have limited bandwidths. The goal of this work was to determine how many harmonics are necessary to model or measure peak pressures, intensity, and heat deposition rates of the HIFU fields. Numerical solutions of the Khokhlov-Zabolotskaya-Kuznetzov-type (KZK) nonlinear parabolic equation were obtained using two independent algorithms, compared, and analyzed for nonlinear propagation in water, in gel phantom, and in tissue. Measurements were performed in the focus of the HIFU field in the same media using fiber optic probe hydrophones of various bandwidths. Experimental data were compared to the simulation results.
Plate-slot polymer waveguide modulator on silicon-on-insulator.
Qiu, Feng; Spring, Andrew M; Hong, Jianxun; Yokoyama, Shiyoshi
2018-04-30
Electro-optic (EO) modulators are vital for efficient "electrical to optical" transitions and high-speed optical interconnects. In this work, we applied an EO polymer to demonstrate modulators on silicon-on-insulator substrates. The fabricated Mach-Zehnder interferometer (MZI) and ring resonator consist of a Si and TiO 2 slot, in which the EO polymer was embedded to realize a low-driving and large bandwidth modulation. The designed optical and electrical constructions are able to provide a highly concentrated TM mode with low propagation loss and effective EO properties. The fabricated MZI modulator shows a π-voltage-length product of 0.66 V·cm and a 3-dB bandwidth of 31 GHz. The measured EO activity is advantageous to exploit the ring modulator with a resonant tunability of 0.065 nm/V and a 3-dB modulation bandwidth up to 13 GHz.
Characterizing output bottlenecks in a supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Bing; Chase, Jeffrey; Dillow, David A
2012-01-01
Supercomputer I/O loads are often dominated by writes. HPC (High Performance Computing) file systems are designed to absorb these bursty outputs at high bandwidth through massive parallelism. However, the delivered write bandwidth often falls well below the peak. This paper characterizes the data absorption behavior of a center-wide shared Lustre parallel file system on the Jaguar supercomputer. We use a statistical methodology to address the challenges of accurately measuring a shared machine under production load and to obtain the distribution of bandwidth across samples of compute nodes, storage targets, and time intervals. We observe and quantify limitations from competing traffic,more » contention on storage servers and I/O routers, concurrency limitations in the client compute node operating systems, and the impact of variance (stragglers) on coupled output such as striping. We then examine the implications of our results for application performance and the design of I/O middleware systems on shared supercomputers.« less
Long-pulse-width narrow-bandwidth solid state laser
Dane, C. Brent; Hackel, Lloyd A.
1997-01-01
A long pulse laser system emits 500-1000 ns quasi-rectangular pulses at 527 nm with near diffraction-limited divergence and near transform-limited bandwidth. The system consists of one or more flashlamp-pumped Nd:glass zig-zag amplifiers, a very low threshold stimulated-Brillouin-scattering (SBS) phase conjugator system, and a free-running single frequency Nd:YLF master oscillator. Completely passive polarization switching provides eight amplifier gain passes. Multiple frequency output can be generated by using SBS cells having different pressures of a gaseous SBS medium or different SBS materials. This long pulse, low divergence, narrow-bandwidth, multi-frequency output laser system is ideally suited for use as an illuminator for long range speckle imaging applications. Because of its high average power and high beam quality, this system has application in any process which would benefit from a long pulse format, including material processing and medical applications.
Long-pulse-width narrow-bandwidth solid state laser
Dane, C.B.; Hackel, L.A.
1997-11-18
A long pulse laser system emits 500-1000 ns quasi-rectangular pulses at 527 nm with near diffraction-limited divergence and near transform-limited bandwidth. The system consists of one or more flashlamp-pumped Nd:glass zig-zag amplifiers, a very low threshold stimulated-Brillouin-scattering (SBS) phase conjugator system, and a free-running single frequency Nd:YLF master oscillator. Completely passive polarization switching provides eight amplifier gain passes. Multiple frequency output can be generated by using SBS cells having different pressures of a gaseous SBS medium or different SBS materials. This long pulse, low divergence, narrow-bandwidth, multi-frequency output laser system is ideally suited for use as an illuminator for long range speckle imaging applications. Because of its high average power and high beam quality, this system has application in any process which would benefit from a long pulse format, including material processing and medical applications. 5 figs.
High Sensitivity Terahertz Detection through Large-Area Plasmonic Nano-Antenna Arrays.
Yardimci, Nezih Tolga; Jarrahi, Mona
2017-02-16
Plasmonic photoconductive antennas have great promise for increasing responsivity and detection sensitivity of conventional photoconductive detectors in time-domain terahertz imaging and spectroscopy systems. However, operation bandwidth of previously demonstrated plasmonic photoconductive antennas has been limited by bandwidth constraints of their antennas and photoconductor parasitics. Here, we present a powerful technique for realizing broadband terahertz detectors through large-area plasmonic photoconductive nano-antenna arrays. A key novelty that makes the presented terahertz detector superior to the state-of-the art is a specific large-area device geometry that offers a strong interaction between the incident terahertz beam and optical pump at the nanoscale, while maintaining a broad operation bandwidth. The large device active area allows robust operation against optical and terahertz beam misalignments. We demonstrate broadband terahertz detection with signal-to-noise ratio levels as high as 107 dB.
High Sensitivity Terahertz Detection through Large-Area Plasmonic Nano-Antenna Arrays
Yardimci, Nezih Tolga; Jarrahi, Mona
2017-01-01
Plasmonic photoconductive antennas have great promise for increasing responsivity and detection sensitivity of conventional photoconductive detectors in time-domain terahertz imaging and spectroscopy systems. However, operation bandwidth of previously demonstrated plasmonic photoconductive antennas has been limited by bandwidth constraints of their antennas and photoconductor parasitics. Here, we present a powerful technique for realizing broadband terahertz detectors through large-area plasmonic photoconductive nano-antenna arrays. A key novelty that makes the presented terahertz detector superior to the state-of-the art is a specific large-area device geometry that offers a strong interaction between the incident terahertz beam and optical pump at the nanoscale, while maintaining a broad operation bandwidth. The large device active area allows robust operation against optical and terahertz beam misalignments. We demonstrate broadband terahertz detection with signal-to-noise ratio levels as high as 107 dB. PMID:28205615
Recent advancements towards green optical networks
NASA Astrophysics Data System (ADS)
Davidson, Alan; Glesk, Ivan; Buis, Adrianus; Wang, Junjia; Chen, Lawrence
2014-12-01
Recent years have seen a rapid growth in demand for ultra high speed data transmission with end users expecting fast, high bandwidth network access. With this rapid growth in demand, data centres are under pressure to provide ever increasing data rates through their networks and at the same time improve the quality of data handling in terms of reduced latency, increased scalability and improved channel speed for users. However as data rates increase, present technology based on well-established CMOS technology is becoming increasingly difficult to scale and consequently data networks are struggling to satisfy current network demand. In this paper the interrelated issues of electronic scalability, power consumption, limited copper interconnect bandwidth and the limited speed of CMOS electronics will be explored alongside the tremendous bandwidth potential of optical fibre based photonic networks. Some applications of photonics to help alleviate the speed and latency in data networks will be discussed.
A HIGH BANDWIDTH BIPOLAR POWER SUPPLY FOR THE FAST CORRECTORS IN THE APS UPGRADE*
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Ju; Sprau, Gary
The APS Upgrade of a multi-bend achromat (MBA) storage ring requires a fast bipolar power supply for the fast correction magnets. The key performance requirement of the power supply includes a small-signal bandwidth of 10 kHz for the output current. This requirement presents a challenge to the design because of the high inductance of the magnet load and a limited input DC voltage. A prototype DC/DC power supply utilizing a MOSFET H-bridge circuit with a 500 kHz PWM has been developed and tested successfully. The prototype achieved a 10-kHz bandwidth with less than 3-dB attenuation for a signal 0.5% ofmore » the maximum operating current of 15 amperes. This paper presents the design of the power circuit, the PWM method, the control loop, and the test results.« less
Embedded instrumentation architecture
Boyd, Gerald M.; Farrow, Jeffrey
2015-09-29
The various technologies presented herein relate to generating copies of an incoming signal, wherein each copy of the signal can undergo different processing to facilitate control of bandwidth demands during communication of one or more signals relating to the incoming signal. A signal sharing component can be utilized to share copies of the incoming signal between a plurality of circuits/components which can include a first A/D converter, a second A/D converter, and a comparator component. The first A/D converter can operate at a low sampling rate and accordingly generates, and continuously transmits, a signal having a low bandwidth requirement. The second A/D converter can operate at a high sampling rate and hence generates a signal having a high bandwidth requirement. Transmission of a signal from the second A/D converter can be controlled by a signaling event (e.g., a signal pulse) being determined to have occurred by the comparator component.
NASA Astrophysics Data System (ADS)
Monavarian, M.; Rashidi, A.; Aragon, A. A.; Nami, M.; Oh, S. H.; DenBaars, S. P.; Feezell, D.
2018-05-01
InGaN/GaN light-emitting diodes (LEDs) with large modulation bandwidths are desirable for visible-light communication. Along with modulation speed, the consideration of the internal quantum efficiency (IQE) under operating conditions is also important. Here, we report the modulation characteristics of semipolar (20 2 ¯ 1 ¯ ) InGaN/GaN (LEDs) with single-quantum well (SQW) and multiple-quantum-well (MQW) active regions grown on free-standing semipolar GaN substrates with peak internal quantum efficiencies (IQEs) of 0.93 and 0.73, respectively. The MQW LEDs exhibit on average about 40-80% higher modulation bandwidth, reaching 1.5 GHz at 13 kA/cm2, but about 27% lower peak IQE than the SQW LEDs. We extract the differential carrier lifetimes (DLTs), RC parasitics, and carrier escape lifetimes and discuss their role in the bandwidth and IQE characteristics. A coulomb-enhanced capture process is shown to rapidly reduce the DLT of the MQW LED at high current densities. Auger recombination is also shown to play little role in increasing the speed of the LEDs. Finally, we investigate the trade-offs between the bandwidth and efficiency and introduce the bandwidth-IQE product as a potential figure of merit for optimizing speed and efficiency in InGaN/GaN LEDs.
Considerations of digital phase modulation for narrowband satellite mobile communication
NASA Technical Reports Server (NTRS)
Grythe, Knut
1990-01-01
The Inmarsat-M system for mobile satellite communication is specified as a frequency division multiple access (FDMA) system, applying Offset Quadrature Phase Shift Keying (QPSK) for transmitting 8 kbit/sec in 10 kHz user channel bandwidth. We consider Digital Phase Modulation (DPM) as an alternative modulation format for INMARSAT-M. DPM is similar to Continuous Phase Modulation (CPM) except that DPM has a finite memory in the premodular filter with a continuous varying modulation index. It is shown that DPM with 64 states in the VA obtains a lower bit error rate (BER). Results for a 5 kHz system, with the same 8 kbit/sec transmitted bitstream, is also presented.
What is the Bandwidth of Perceptual Experience?
Cohen, Michael A; Dennett, Daniel C; Kanwisher, Nancy
2016-05-01
Although our subjective impression is of a richly detailed visual world, numerous empirical results suggest that the amount of visual information observers can perceive and remember at any given moment is limited. How can our subjective impressions be reconciled with these objective observations? Here, we answer this question by arguing that, although we see more than the handful of objects, claimed by prominent models of visual attention and working memory, we still see far less than we think we do. Taken together, we argue that these considerations resolve the apparent conflict between our subjective impressions and empirical data on visual capacity, while also illuminating the nature of the representations underlying perceptual experience. Copyright © 2016 Elsevier Ltd. All rights reserved.
Information Switching Processor (ISP) contention analysis and control
NASA Technical Reports Server (NTRS)
Inukai, Thomas
1995-01-01
In designing a satellite system with on-board processing, the selection of a switching architecture is often critical. The on-board switching function can be implemented by circuit switching or packet switching. Destination-directed packet switching has several attractive features, such as self-routing without on-board switch reconfiguration, no switch control memory requirement, efficient bandwidth utilization for packet switched traffic, and accommodation of circuit switched traffic. Destination-directed packet switching, however, has two potential concerns: (1) contention and (2) congestion. And this report specifically deals with the first problem. It includes a description and analysis of various self-routing switch structures, the nature of contention problems, and contention and resolution techniques.
Kim, Jeremie S; Senol Cali, Damla; Xin, Hongyi; Lee, Donghyuk; Ghose, Saugata; Alser, Mohammed; Hassan, Hasan; Ergin, Oguz; Alkan, Can; Mutlu, Onur
2018-05-09
Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract reference sequences at each of the mapping locations, and 3) check similarity between each read and its associated reference sequences with a computationally-expensive algorithm (i.e., sequence alignment) to determine the origin of the read. A seed location filter comes into play before alignment, discarding seed locations that alignment would deem a poor match. The ideal seed location filter would discard all poor match locations prior to alignment such that there is no wasted computation on unnecessary alignments. We propose a novel seed location filtering algorithm, GRIM-Filter, optimized to exploit 3D-stacked memory systems that integrate computation within a logic layer stacked under memory layers, to perform processing-in-memory (PIM). GRIM-Filter quickly filters seed locations by 1) introducing a new representation of coarse-grained segments of the reference genome, and 2) using massively-parallel in-memory operations to identify read presence within each coarse-grained segment. Our evaluations show that for a sequence alignment error tolerance of 0.05, GRIM-Filter 1) reduces the false negative rate of filtering by 5.59x-6.41x, and 2) provides an end-to-end read mapper speedup of 1.81x-3.65x, compared to a state-of-the-art read mapper employing the best previous seed location filtering algorithm. GRIM-Filter exploits 3D-stacked memory, which enables the efficient use of processing-in-memory, to overcome the memory bandwidth bottleneck in seed location filtering. We show that GRIM-Filter significantly improves the performance of a state-of-the-art read mapper. GRIM-Filter is a universal seed location filter that can be applied to any read mapper. We hope that our results provide inspiration for new works to design other bioinformatics algorithms that take advantage of emerging technologies and new processing paradigms, such as processing-in-memory using 3D-stacked memory devices.
Bazargani, Hamed Pishvai; Burla, Maurizio; Chrostowski, Lukas; Azaña, José
2016-11-01
We experimentally demonstrate high-performance integer and fractional-order photonic Hilbert transformers based on laterally apodized Bragg gratings in a silicon-on-insulator technology platform. The sub-millimeter-long gratings have been fabricated using single-etch electron beam lithography, and the resulting HT devices offer operation bandwidths approaching the THz range, with time-bandwidth products between 10 and 20.
ERIC Educational Resources Information Center
Ricketts, Todd A.; Dittberner, Andrew B.; Johnson, Earl E.
2008-01-01
Purpose: One factor that has been shown to greatly affect sound quality is audible bandwidth. Provision of gain for frequencies above 4-6 kHz has not generally been supported for groups of hearing aid wearers. The purpose of this study was to determine if preference for bandwidth extension in hearing aid processed sounds was related to the…
Optical air-coupled NDT system with ultra-broad frequency bandwidth (Conference Presentation)
NASA Astrophysics Data System (ADS)
Fischer, Balthasar; Rohringer, Wolfgang; Heine, Thomas
2017-05-01
We present a novel, optical ultrasound airborne acoustic testing setup exhibiting a frequency bandwidth of 1MHz in air. The sound waves are detected by a miniaturized Fabry-Pérot interferometer (2mm cavity) whilst the sender consists of a thermoacoustic emitter or a short laser pulse We discuss characterization measurements and C-scans of a selected set of samples, including Carbon fiber reinforced polymer (CFRP). The high detector sensitivity allows for an increased penetration depth. The high frequency and the small transducer dimensions lead to a compelling image resolution.
Network Implementation Trade-Offs in Existing Homes
NASA Astrophysics Data System (ADS)
Keiser, Gerd
2013-03-01
The ever-increasing demand for networking of high-bandwidth services in existing homes has resulted in several options for implementing an in-home network. Among the options are power-line communication techniques, twisted-pair copper wires, wireless links, and plastic or glass optical fibers. Whereas it is easy to install high-bandwidth optical fibers during the construction of new living units, retrofitting of existing homes with networking capabilities requires some technology innovations. This article addresses some trade-offs that need to be made on what transmission media can be retrofitted most effectively in existing homes.
Frequency agile microwave photonic notch filter with anomalously high stopband rejection.
Marpaung, David; Morrison, Blair; Pant, Ravi; Eggleton, Benjamin J
2013-11-01
We report a novel class microwave photonic (MWP) notch filter with a very narrow isolation bandwidth (10 MHz), an ultrahigh stopband rejection (>60 dB), a wide frequency tuning (1-30 GHz), and flexible bandwidth reconfigurability (10-65 MHz). This performance is enabled by a new concept of sideband amplitude and phase controls using an electro-optic modulator and an optical filter. This concept enables energy efficient operation in active MWP notch filters, and opens up a pathway toward enabling low-power nanophotonic devices as high-performance RF filters.
Xie, Kai; Liu, Yan; Li, XiaoPing; Guo, Lixin; Zhang, Hanlu
2016-04-01
The bandwidth and low noise characteristics are often contradictory in ultra-low current amplifier, because an inevitable parasitic capacitance is paralleled with the high value feedback resistor. In order to expand the amplifier's bandwidth, a novel approach was proposed by introducing an artificial negative capacitor to cancel the parasitic capacitance. The theory of the negative capacitance and the performance of the improved amplifier circuit with the negative capacitor are presented in this manuscript. The test was conducted by modifying an ultra-low current amplifier with a trans-impedance gain of 50 GΩ. The results show that the maximum bandwidth was expanded from 18.7 Hz to 3.3 kHz with more than 150 times of increase when the parasitic capacitance (∼0.17 pF) was cancelled. Meanwhile, the rise time decreased from 18.7 ms to 0.26 ms with no overshot. Any desired bandwidth or rise time within these ranges can be obtained by adjusting the ratio of cancellation of the parasitic and negative capacitance. This approach is especially suitable for the demand of rapid response to weak current, such as transient ion-beam detector, mass spectrometry analysis, and fast scanning microscope.
Voltage-dependent K+ channels improve the energy efficiency of signalling in blowfly photoreceptors
2017-01-01
Voltage-dependent conductances in many spiking neurons are tuned to reduce action potential energy consumption, so improving the energy efficiency of spike coding. However, the contribution of voltage-dependent conductances to the energy efficiency of analogue coding, by graded potentials in dendrites and non-spiking neurons, remains unclear. We investigate the contribution of voltage-dependent conductances to the energy efficiency of analogue coding by modelling blowfly R1-6 photoreceptor membrane. Two voltage-dependent delayed rectifier K+ conductances (DRs) shape the membrane's voltage response and contribute to light adaptation. They make two types of energy saving. By reducing membrane resistance upon depolarization they convert the cheap, low bandwidth membrane needed in dim light to the expensive high bandwidth membrane needed in bright light. This investment of energy in bandwidth according to functional requirements can halve daily energy consumption. Second, DRs produce negative feedback that reduces membrane impedance and increases bandwidth. This negative feedback allows an active membrane with DRs to consume at least 30% less energy than a passive membrane with the same capacitance and bandwidth. Voltage-dependent conductances in other non-spiking neurons, and in dendrites, might be organized to make similar savings. PMID:28381642
Investigation of voltage source design's for Electrical Impedance Mammography (EIM) Systems.
Qureshi, Tabassum R; Chatwin, Chris R; Zhou, Zhou; Li, Nan; Wang, W
2012-01-01
According to Jossient, interesting characteristics of breast tissues mostly lie above 1MHz; therefore a wideband excitation source covering higher frequencies (i.e. above 1MHz) is required. The main objective of this research is to establish a feasible bandwidth envelope that can be used to design a constant EIM voltage source over a wide bandwidth with low output impedance for practical implementation. An excitation source is one of the major components in bio-impedance measurement systems. In any bio-impedance measurement system the excitation source can be achieved either by injecting current and measuring the resulting voltages, or by applying voltages and measuring the current developed. This paper describes three voltage source architectures and based on their bandwidth comparison; a differential voltage controlled voltage source (VCVS) is proposed, which can be used over a wide bandwidth (>15MHz). This paper describes the performance of the designed EIM voltage source for different load conditions and load capacitances reporting signal-to-noise ratio of approx 90dB at 10MHz frequency, signal phase and maximum of 4.75kΩ source output impedance at 10MHz. Optimum data obtained using Pspice® is used to demonstrate the high-bandwidth performance of the source.
Voltage-dependent K+ channels improve the energy efficiency of signalling in blowfly photoreceptors.
Heras, Francisco J H; Anderson, John; Laughlin, Simon B; Niven, Jeremy E
2017-04-01
Voltage-dependent conductances in many spiking neurons are tuned to reduce action potential energy consumption, so improving the energy efficiency of spike coding. However, the contribution of voltage-dependent conductances to the energy efficiency of analogue coding, by graded potentials in dendrites and non-spiking neurons, remains unclear. We investigate the contribution of voltage-dependent conductances to the energy efficiency of analogue coding by modelling blowfly R1-6 photoreceptor membrane. Two voltage-dependent delayed rectifier K + conductances (DRs) shape the membrane's voltage response and contribute to light adaptation. They make two types of energy saving. By reducing membrane resistance upon depolarization they convert the cheap, low bandwidth membrane needed in dim light to the expensive high bandwidth membrane needed in bright light. This investment of energy in bandwidth according to functional requirements can halve daily energy consumption. Second, DRs produce negative feedback that reduces membrane impedance and increases bandwidth. This negative feedback allows an active membrane with DRs to consume at least 30% less energy than a passive membrane with the same capacitance and bandwidth. Voltage-dependent conductances in other non-spiking neurons, and in dendrites, might be organized to make similar savings. © 2017 The Author(s).
Geisler, David J; Fontaine, Nicolas K; Scott, Ryan P; He, Tingting; Paraschis, Loukas; Gerstel, Ori; Heritage, Jonathan P; Yoo, S J B
2011-04-25
We demonstrate an optical transmitter based on dynamic optical arbitrary waveform generation (OAWG) which is capable of creating high-bandwidth (THz) data waveforms in any modulation format using the parallel synthesis of multiple coherent spectral slices. As an initial demonstration, the transmitter uses only 5.5 GHz of electrical bandwidth and two 10-GHz-wide spectral slices to create 100-ns duration, 20-GHz optical waveforms in various modulation formats including differential phase-shift keying (DPSK), quaternary phase-shift keying (QPSK), and eight phase-shift keying (8PSK) with only changes in software. The experimentally generated waveforms showed clear eye openings and separated constellation points when measured using a real-time digital coherent receiver. Bit-error-rate (BER) performance analysis resulted in a BER < 9.8 × 10(-6) for DPSK and QPSK waveforms. Additionally, we experimentally demonstrate three-slice, 4-ns long waveforms that highlight the bandwidth scalable nature of the optical transmitter. The various generated waveforms show that the key transmitter properties (i.e., packet length, modulation format, data rate, and modulation filter shape) are software definable, and that the optical transmitter is capable of acting as a flexible bandwidth transmitter.
Efficient traffic grooming with dynamic ONU grouping for multiple-OLT-based access network
NASA Astrophysics Data System (ADS)
Zhang, Shizong; Gu, Rentao; Ji, Yuefeng; Wang, Hongxiang
2015-12-01
Fast bandwidth growth urges large-scale high-density access scenarios, where the multiple Passive Optical Networking (PON) system clustered deployment can be adopted as an appropriate solution to fulfill the huge bandwidth demands, especially for a future 5G mobile network. However, the lack of interaction between different optical line terminals (OLTs) results in part of the bandwidth resources waste. To increase the bandwidth efficiency, as well as reduce bandwidth pressure at the edge of a network, we propose a centralized flexible PON architecture based on Time- and Wavelength-Division Multiplexing PON (TWDM PON). It can provide flexible affiliation for optical network units (ONUs) and different OLTs to support access network traffic localization. Specifically, a dynamic ONU grouping algorithm (DGA) is provided to obtain the minimal OLT outbound traffic. Simulation results show that DGA obtains an average 25.23% traffic gain increment under different OLT numbers within a small ONU number situation, and the traffic gain will increase dramatically with the increment of the ONU number. As the DGA can be deployed easily as an application running above the centralized control plane, the proposed architecture can be helpful to improve the network efficiency for future traffic-intensive access scenarios.
Solid-State Laser Source of Tunable Narrow-Bandwidth Ultraviolet Radiation
NASA Technical Reports Server (NTRS)
Goldberg, Lew; Kliner, Dahv A.; Koplow, Jeffrey P.
1998-01-01
A solid-state laser source of tunable and narrow-bandwidth UV light is disclosed. The system relies on light from a diode laser that preferably generates light at infrared frequencies. The light from the seed diode laser is pulse amplified in a light amplifier, and converted into the ultraviolet by frequency tripling, quadrupling, or quintupling the infrared light. The narrow bandwidth, or relatively pure light, of the seed laser is preserved, and the pulse amplifier generates high peak light powers to increase the efficiency of the nonlinear crystals in the frequency conversion stage. Higher output powers may be obtained by adding a fiber amplifier to power amplify the pulsed laser light prior to conversion.
Problematic projection to the in-sample subspace for a kernelized anomaly detector
Theiler, James; Grosklos, Guen
2016-03-07
We examine the properties and performance of kernelized anomaly detectors, with an emphasis on the Mahalanobis-distance-based kernel RX (KRX) algorithm. Although the detector generally performs well for high-bandwidth Gaussian kernels, it exhibits problematic (in some cases, catastrophic) performance for distances that are large compared to the bandwidth. By comparing KRX to two other anomaly detectors, we can trace the problem to a projection in feature space, which arises when a pseudoinverse is used on the covariance matrix in that feature space. Here, we show that a regularized variant of KRX overcomes this difficulty and achieves superior performance over a widemore » range of bandwidths.« less
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors
NASA Astrophysics Data System (ADS)
Sourouri, Mohammed; Birger Raknes, Espen
2017-04-01
In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
47 CFR 24.133 - Emission limits.
Code of Federal Regulations, 2012 CFR
2012-10-01
... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement...
47 CFR 24.133 - Emission limits.
Code of Federal Regulations, 2013 CFR
2013-10-01
... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement...
47 CFR 24.133 - Emission limits.
Code of Federal Regulations, 2011 CFR
2011-10-01
... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement...
47 CFR 24.133 - Emission limits.
Code of Federal Regulations, 2010 CFR
2010-10-01
... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement...
47 CFR 24.133 - Emission limits.
Code of Federal Regulations, 2014 CFR
2014-10-01
... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement... outside the authorized bandwidth and removed from the edge of the authorized bandwidth by a displacement...
Electronics for CMS Endcap Muon Level-1 Trigger System Phase-1 and HL LHC upgrades
NASA Astrophysics Data System (ADS)
Madorsky, A.
2017-07-01
To accommodate high-luminosity LHC operation at a 13 TeV collision energy, the CMS Endcap Muon Level-1 Trigger system had to be significantly modified. To provide robust track reconstruction, the trigger system must now import all available trigger primitives generated by the Cathode Strip Chambers and by certain other subsystems, such as Resistive Plate Chambers (RPC). In addition to massive input bandwidth, this also required significant increase in logic and memory resources. To satisfy these requirements, a new Sector Processor unit has been designed. It consists of three modules. The Core Logic module houses the large FPGA that contains the track-finding logic and multi-gigabit serial links for data exchange. The Optical module contains optical receivers and transmitters; it communicates with the Core Logic module via a custom backplane section. The Pt Lookup table (PTLUT) module contains 1 GB of low-latency memory that is used to assign the final Pt to reconstructed muon tracks. The μ TCA architecture (adopted by CMS) was used for this design. The talk presents the details of the hardware and firmware design of the production system based on Xilinx Virtex-7 FPGA family. The next round of LHC and CMS upgrades starts in 2019, followed by a major High-Luminosity (HL) LHC upgrade starting in 2024. In the course of these upgrades, new Gas Electron Multiplier (GEM) detectors and more RPC chambers will be added to the Endcap Muon system. In order to keep up with all these changes, a new Advanced Processor unit is being designed. This device will be based on Xilinx UltraScale+ FPGAs. It will be able to accommodate up to 100 serial links with bit rates of up to 25 Gb/s, and provide up to 2.5 times more logic resources than the device used currently. The amount of PTLUT memory will be significantly increased to provide more flexibility for the Pt assignment algorithm. The talk presents preliminary details of the hardware design program.
VLSI design of lossless frame recompression using multi-orientation prediction
NASA Astrophysics Data System (ADS)
Lee, Yu-Hsuan; You, Yi-Lun; Chen, Yi-Guo
2016-01-01
Pursuing an experience of high-end visual quality drives human to demand a higher display resolution and a higher frame rate. Hence, a lot of powerful coding tools are aggregated together in emerging video coding standards to improve coding efficiency. This also makes video coding standards suffer from two design challenges: heavy computation and tremendous memory bandwidth. The first issue can be properly solved by a careful hardware architecture design with advanced semiconductor processes. Nevertheless, the second one becomes a critical design bottleneck for a modern video coding system. In this article, a lossless frame recompression using multi-orientation prediction technique is proposed to overcome this bottleneck. This work is realised into a silicon chip with the technology of TSMC 0.18 µm CMOS process. Its encoding capability can reach full-HD (1920 × 1080)@48 fps. The chip power consumption is 17.31 mW@100 MHz. Core area and chip area are 0.83 × 0.83 mm2 and 1.20 × 1.20 mm2, respectively. Experiment results demonstrate that this work exhibits an outstanding performance on lossless compression ratio with a competitive hardware performance.
A 500 megabyte/second disk array
NASA Technical Reports Server (NTRS)
Ruwart, Thomas M.; Okeefe, Matthew T.
1994-01-01
Applications at the Army High Performance Computing Research Center's (AHPCRC) Graphic and Visualization Laboratory (GVL) at the University of Minnesota require a tremendous amount of I/O bandwidth and this appetite for data is growing. Silicon Graphics workstations are used to perform the post-processing, visualization, and animation of multi-terabyte size datasets produced by scientific simulations performed of AHPCRC supercomputers. The M.A.X. (Maximum Achievable Xfer) was designed to find the maximum achievable I/O performance of the Silicon Graphics CHALLENGE/Onyx-class machines that run these applications. Running a fully configured Onyx machine with 12-150MHz R4400 processors, 512MB of 8-way interleaved memory, 31 fast/wide SCSI-2 channel each with a Ciprico disk array controller we were able to achieve a maximum sustained transfer rate of 509.8 megabytes per second. However, after analyzing the results it became clear that the true maximum transfer rate is somewhat beyond this figure and we will need to do further testing with more disk array controllers in order to find the true maximum.
NASA Astrophysics Data System (ADS)
Gregorio, Fernando; Cousseau, Juan; Werner, Stefan; Riihonen, Taneli; Wichman, Risto
2011-12-01
The design of predistortion techniques for broadband multiple input multiple output-OFDM (MIMO-OFDM) systems raises several implementation challenges. First, the large bandwidth of the OFDM signal requires the introduction of memory effects in the PD model. In addition, it is usual to consider an imbalanced in-phase and quadrature (IQ) modulator to translate the predistorted baseband signal to RF. Furthermore, the coupling effects, which occur when the MIMO paths are implemented in the same reduced size chipset, cannot be avoided in MIMO transceivers structures. This study proposes a MIMO-PD system that linearizes the power amplifier response and compensates nonlinear crosstalk and IQ imbalance effects for each branch of the multiantenna system. Efficient recursive algorithms are presented to estimate the complete MIMO-PD coefficients. The algorithms avoid the high computational complexity in previous solutions based on least squares estimation. The performance of the proposed MIMO-PD structure is validated by simulations using a two-transmitter antenna MIMO system. Error vector magnitude and adjacent channel power ratio are evaluated showing significant improvement compared with conventional MIMO-PD systems.