Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Seeing the forest for the trees: Networked workstations as a parallel processing computer
NASA Technical Reports Server (NTRS)
Breen, J. O.; Meleedy, D. M.
1992-01-01
Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Limits to high-speed simulations of spiking neural networks using general-purpose computers.
Zenke, Friedemann; Gerstner, Wulfram
2014-01-01
To understand how the central nervous system performs computations using recurrent neuronal circuitry, simulations have become an indispensable tool for theoretical neuroscience. To study neuronal circuits and their ability to self-organize, increasing attention has been directed toward synaptic plasticity. In particular spike-timing-dependent plasticity (STDP) creates specific demands for simulations of spiking neural networks. On the one hand a high temporal resolution is required to capture the millisecond timescale of typical STDP windows. On the other hand network simulations have to evolve over hours up to days, to capture the timescale of long-term plasticity. To do this efficiently, fast simulation speed is the crucial ingredient rather than large neuron numbers. Using different medium-sized network models consisting of several thousands of neurons and off-the-shelf hardware, we compare the simulation speed of the simulators: Brian, NEST and Neuron as well as our own simulator Auryn. Our results show that real-time simulations of different plastic network models are possible in parallel simulations in which numerical precision is not a primary concern. Even so, the speed-up margin of parallelism is limited and boosting simulation speeds beyond one tenth of real-time is difficult. By profiling simulation code we show that the run times of typical plastic network simulations encounter a hard boundary. This limit is partly due to latencies in the inter-process communications and thus cannot be overcome by increased parallelism. Overall, these results show that to study plasticity in medium-sized spiking neural networks, adequate simulation tools are readily available which run efficiently on small clusters. However, to run simulations substantially faster than real-time, special hardware is a prerequisite.
JPARSS: A Java Parallel Network Package for Grid Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Jie; Akers, Walter; Chen, Ying
2002-03-01
The emergence of high speed wide area networks makes grid computinga reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve bandwidth and to reduce latency on a high speed wide area network. This paper presents a Java package called JPARSS (Java Parallel Secure Stream (Socket)) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a grid environment without the necessity of tuning TCP window size.more » This package enables single sign-on, certificate delegation and secure or plain-text data transfer using several security components based on X.509 certificate and SSL. Several experiments will be presented to show that using Java parallelstreams is more effective than tuning TCP window size. In addition a simple architecture using Web services« less
NASA Technical Reports Server (NTRS)
Johnston, William; Tierney, Brian; Lee, Jason; Hoo, Gary; Thompson, Mary
1996-01-01
We have developed and deployed a distributed-parallel storage system (DPSS) in several high speed asynchronous transfer mode (ATM) wide area networks (WAN) testbeds to support several different types of data-intensive applications. Architecturally, the DPSS is a network striped disk array, but is fairly unique in that its implementation allows applications complete freedom to determine optimal data layout, replication and/or coding redundancy strategy, security policy, and dynamic reconfiguration. In conjunction with the DPSS, we have developed a 'top-to-bottom, end-to-end' performance monitoring and analysis methodology that has allowed us to characterize all aspects of the DPSS operating in high speed ATM networks. In particular, we have run a variety of performance monitoring experiments involving the DPSS in the MAGIC testbed, which is a large scale, high speed, ATM network and we describe our experience using the monitoring methodology to identify and correct problems that limit the performance of high speed distributed applications. Finally, the DPSS is part of an overall architecture for using high speed, WAN's for enabling the routine, location independent use of large data-objects. Since this is part of the motivation for a distributed storage system, we describe this architecture.
Compact holographic optical neural network system for real-time pattern recognition
NASA Astrophysics Data System (ADS)
Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.
1996-08-01
One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
Big data driven cycle time parallel prediction for production planning in wafer manufacturing
NASA Astrophysics Data System (ADS)
Wang, Junliang; Yang, Jungang; Zhang, Jie; Wang, Xiaoxi; Zhang, Wenjun Chris
2018-07-01
Cycle time forecasting (CTF) is one of the most crucial issues for production planning to keep high delivery reliability in semiconductor wafer fabrication systems (SWFS). This paper proposes a novel data-intensive cycle time (CT) prediction system with parallel computing to rapidly forecast the CT of wafer lots with large datasets. First, a density peak based radial basis function network (DP-RBFN) is designed to forecast the CT with the diverse and agglomerative CT data. Second, the network learning method based on a clustering technique is proposed to determine the density peak. Third, a parallel computing approach for network training is proposed in order to speed up the training process with large scaled CT data. Finally, an experiment with respect to SWFS is presented, which demonstrates that the proposed CTF system can not only speed up the training process of the model but also outperform the radial basis function network, the back-propagation-network and multivariate regression methodology based CTF methods in terms of the mean absolute deviation and standard deviation.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Applications considerations in the system design of highly concurrent multiprocessors
NASA Technical Reports Server (NTRS)
Lundstrom, Stephen F.
1987-01-01
A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
Parallel network simulations with NEURON.
Migliore, M; Cannia, C; Lytton, W W; Markram, Henry; Hines, M L
2006-10-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2,000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored.
Parallel Network Simulations with NEURON
Migliore, M.; Cannia, C.; Lytton, W.W; Markram, Henry; Hines, M. L.
2009-01-01
The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. PMID:16732488
NASA Astrophysics Data System (ADS)
Vivoni, Enrique R.; Mascaro, Giuseppe; Mniszewski, Susan; Fasel, Patricia; Springer, Everett P.; Ivanov, Valeriy Y.; Bras, Rafael L.
2011-10-01
SummaryA major challenge in the use of fully-distributed hydrologic models has been the lack of computational capabilities for high-resolution, long-term simulations in large river basins. In this study, we present the parallel model implementation and real-world hydrologic assessment of the Triangulated Irregular Network (TIN)-based Real-time Integrated Basin Simulator (tRIBS). Our parallelization approach is based on the decomposition of a complex watershed using the channel network as a directed graph. The resulting sub-basin partitioning divides effort among processors and handles hydrologic exchanges across boundaries. Through numerical experiments in a set of nested basins, we quantify parallel performance relative to serial runs for a range of processors, simulation complexities and lengths, and sub-basin partitioning methods, while accounting for inter-run variability on a parallel computing system. In contrast to serial simulations, the parallel model speed-up depends on the variability of hydrologic processes. Load balancing significantly improves parallel speed-up with proportionally faster runs as simulation complexity (domain resolution and channel network extent) increases. The best strategy for large river basins is to combine a balanced partitioning with an extended channel network, with potential savings through a lower TIN resolution. Based on these advances, a wider range of applications for fully-distributed hydrologic models are now possible. This is illustrated through a set of ensemble forecasts that account for precipitation uncertainty derived from a statistical downscaling model.
A Systems Approach to Scalable Transportation Network Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Perumalla, Kalyan S
2006-01-01
Emerging needs in transportation network modeling and simulation are raising new challenges with respect to scal-ability of network size and vehicular traffic intensity, speed of simulation for simulation-based optimization, and fidel-ity of vehicular behavior for accurate capture of event phe-nomena. Parallel execution is warranted to sustain the re-quired detail, size and speed. However, few parallel simulators exist for such applications, partly due to the challenges underlying their development. Moreover, many simulators are based on time-stepped models, which can be computationally inefficient for the purposes of modeling evacuation traffic. Here an approach is presented to de-signing a simulator with memory andmore » speed efficiency as the goals from the outset, and, specifically, scalability via parallel execution. The design makes use of discrete event modeling techniques as well as parallel simulation meth-ods. Our simulator, called SCATTER, is being developed, incorporating such design considerations. Preliminary per-formance results are presented on benchmark road net-works, showing scalability to one million vehicles simu-lated on one processor.« less
Collective network for computer structures
Blumrich, Matthias A; Coteus, Paul W; Chen, Dong; Gara, Alan; Giampapa, Mark E; Heidelberger, Philip; Hoenicke, Dirk; Takken, Todd E; Steinmacher-Burow, Burkhard D; Vranas, Pavlos M
2014-01-07
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to the needs of a processing algorithm.
Collective network for computer structures
Blumrich, Matthias A [Ridgefield, CT; Coteus, Paul W [Yorktown Heights, NY; Chen, Dong [Croton On Hudson, NY; Gara, Alan [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Takken, Todd E [Brewster, NY; Steinmacher-Burow, Burkhard D [Wernau, DE; Vranas, Pavlos M [Bedford Hills, NY
2011-08-16
A system and method for enabling high-speed, low-latency global collective communications among interconnected processing nodes. The global collective network optimally enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices ate included that interconnect the nodes of the network via links to facilitate performance of low-latency global processing operations at nodes of the virtual network and class structures. The global collective network may be configured to provide global barrier and interrupt functionality in asynchronous or synchronized manner. When implemented in a massively-parallel supercomputing structure, the global collective network is physically and logically partitionable according to needs of a processing algorithm.
The science of computing - Parallel computation
NASA Technical Reports Server (NTRS)
Denning, P. J.
1985-01-01
Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
A parallel input composite transimpedance amplifier.
Kim, D J; Kim, C
2018-01-01
A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
A parallel input composite transimpedance amplifier
NASA Astrophysics Data System (ADS)
Kim, D. J.; Kim, C.
2018-01-01
A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
The Xpress Transfer Protocol (XTP): A tutorial (expanded version)
NASA Technical Reports Server (NTRS)
Sanders, Robert M.; Weaver, Alfred C.
1990-01-01
The Xpress Transfer Protocol (XTP) is a reliable, real-time, light weight transfer layer protocol. Current transport layer protocols such as DoD's Transmission Control Protocol (TCP) and ISO's Transport Protocol (TP) were not designed for the next generation of high speed, interconnected reliable networks such as fiber distributed data interface (FDDI) and the gigabit/second wide area networks. Unlike all previous transport layer protocols, XTP is being designed to be implemented in hardware as a VLSI chip set. By streamlining the protocol, combining the transport and network layers and utilizing the increased speed and parallelization possible with a VLSI implementation, XTP will be able to provide the end-to-end data transmission rates demanded in high speed networks without compromising reliability and functionality. This paper describes the operation of the XTP protocol and in particular, its error, flow and rate control; inter-networking addressing mechanisms; and multicast support features, as defined in the XTP Protocol Definition Revision 3.4.
Cascaded VLSI neural network architecture for on-line learning
NASA Technical Reports Server (NTRS)
Thakoor, Anilkumar P. (Inventor); Duong, Tuan A. (Inventor); Daud, Taher (Inventor)
1992-01-01
High-speed, analog, fully-parallel, and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A computation intensive feature classification application was demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as an application specific coprocessor for solving real world problems at extremely high data rates.
Cascaded VLSI neural network architecture for on-line learning
NASA Technical Reports Server (NTRS)
Duong, Tuan A. (Inventor); Daud, Taher (Inventor); Thakoor, Anilkumar P. (Inventor)
1995-01-01
High-speed, analog, fully-parallel and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware-compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A comparison-intensive feature classification application has been demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as application-specific-coprocessors for solving real-world problems at extremely high data rates.
(abstract) A High Throughput 3-D Inner Product Processor
NASA Technical Reports Server (NTRS)
Daud, Tuan
1996-01-01
A particularily challenging image processing application is the real time scene acquisition and object discrimination. It requires spatio-temporal recognition of point and resolved objects at high speeds with parallel processing algorithms. Neural network paradigms provide fine grain parallism and, when implemented in hardware, offer orders of magnitude speed up. However, neural networks implemented on a VLSI chip are planer architectures capable of efficient processing of linear vector signals rather than 2-D images. Therefore, for processing of images, a 3-D stack of neural-net ICs receiving planar inputs and consuming minimal power are required. Details of the circuits with chip architectures will be described with need to develop ultralow-power electronics. Further, use of the architecture in a system for high-speed processing will be illustrated.
Automated target recognition and tracking using an optical pattern recognition neural network
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin
1991-01-01
The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.
RAMA: A file system for massively parallel computers
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1993-01-01
This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Low Temperature Performance of High-Speed Neural Network Circuits
NASA Technical Reports Server (NTRS)
Duong, T.; Tran, M.; Daud, T.; Thakoor, A.
1995-01-01
Artificial neural networks, derived from their biological counterparts, offer a new and enabling computing paradigm specially suitable for such tasks as image and signal processing with feature classification/object recognition, global optimization, and adaptive control. When implemented in fully parallel electronic hardware, it offers orders of magnitude speed advantage. Basic building blocks of the new architecture are the processing elements called neurons implemented as nonlinear operational amplifiers with sigmoidal transfer function, interconnected through weighted connections called synapses implemented using circuitry for weight storage and multiply functions either in an analog, digital, or hybrid scheme.
High-performance parallel interface to synchronous optical network gateway
St. John, Wallace B.; DuBois, David H.
1996-01-01
A system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway.
Towards green high capacity optical networks
NASA Astrophysics Data System (ADS)
Glesk, I.; Mohd Warip, M. N.; Idris, S. K.; Osadola, T. B.; Andonovic, I.
2011-09-01
The demand for fast, secure, energy efficient high capacity networks is growing. It is fuelled by transmission bandwidth needs which will support among other things the rapid penetration of multimedia applications empowering smart consumer electronics and E-businesses. All the above trigger unparallel needs for networking solutions which must offer not only high-speed low-cost "on demand" mobile connectivity but should be ecologically friendly and have low carbon footprint. The first answer to address the bandwidth needs was deployment of fibre optic technologies into transport networks. After this it became quickly obvious that the inferior electronic bandwidth (if compared to optical fiber) will further keep its upper hand on maximum implementable serial data rates. A new solution was found by introducing parallelism into data transport in the form of Wavelength Division Multiplexing (WDM) which has helped dramatically to improve aggregate throughput of optical networks. However with these advancements a new bottleneck has emerged at fibre endpoints where data routers must process the incoming and outgoing traffic. Here, even with the massive and power hungry electronic parallelism routers today (still relying upon bandwidth limiting electronics) do not offer needed processing speeds networks demands. In this paper we will discuss some novel unconventional approaches to address network scalability leading to energy savings via advance optical signal processing. We will also investigate energy savings based on advanced network management through nodes hibernation proposed for Optical IP networks. The hibernation reduces the network overall power consumption by forming virtual network reconfigurations through selective nodes groupings and by links segmentations and partitionings.
Photonic reservoir computing: a new approach to optical information processing
NASA Astrophysics Data System (ADS)
Vandoorne, Kristof; Fiers, Martin; Verstraeten, David; Schrauwen, Benjamin; Dambre, Joni; Bienstman, Peter
2010-06-01
Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently, advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks that has been successfully used in several pattern classification problems, like speech and image recognition. Thus far, most implementations have been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it to conventional software implementations using a benchmark speech recognition task. In spite of the differences with classical reservoir models, the performance of our photonic reservoir is comparable to that of conventional implementations and sometimes slightly better. As our implementation uses coherent light for information processing, we find that phase tuning is crucial to obtain high performance. In parallel we investigate the use of a network of photonic crystal cavities. The coupled mode theory (CMT) is used to investigate these resonators. A new framework is designed to model networks of resonators and SOAs. The same network topologies are used, but feedback is added to control the internal dynamics of the system. By adjusting the readout weights of the network in a controlled manner, we can generate arbitrary periodic patterns.
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional techniques and enables multiple assignments, (many to many), not achievable with standard statistical approaches. Tactical movement planning (finding the best path from A to B) is accomplished with a digital two-dimensional concurrent processor array. By exploiting the natural parallel decomposition of the problem in silicon, a four order of magnitude speed-up over optimized software approaches has been demonstrated.
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Hardware Neural Network for a Visual Inspection System
NASA Astrophysics Data System (ADS)
Chun, Seungwoo; Hayakawa, Yoshihiro; Nakajima, Koji
The visual inspection of defects in products is heavily dependent on human experience and instinct. In this situation, it is difficult to reduce the production costs and to shorten the inspection time and hence the total process time. Consequently people involved in this area desire an automatic inspection system. In this paper, we propose a hardware neural network, which is expected to provide high-speed operation for automatic inspection of products. Since neural networks can learn, this is a suitable method for self-adjustment of criteria for classification. To achieve high-speed operation, we use parallel and pipelining techniques. Furthermore, we use a piecewise linear function instead of a conventional activation function in order to save hardware resources. Consequently, our proposed hardware neural network achieved 6GCPS and 2GCUPS, which in our test sample proved to be sufficiently fast.
A simulation-based study of HighSpeed TCP and its deployment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Souza, Evandro de
2003-05-01
The current congestion control mechanism used in TCP has difficulty reaching full utilization on high speed links, particularly on wide-area connections. For example, the packet drop rate needed to fill a Gigabit pipe using the present TCP protocol is below the currently achievable fiber optic error rates. HighSpeed TCP was recently proposed as a modification of TCP's congestion control mechanism to allow it to achieve reasonable performance in high speed wide-area links. In this research, simulation results showing the performance of HighSpeed TCP and the impact of its use on the present implementation of TCP are presented. Network conditions includingmore » different degrees of congestion, different levels of loss rate, different degrees of bursty traffic and two distinct router queue management policies were simulated. The performance and fairness of HighSpeed TCP were compared to the existing TCP and solutions for bulk-data transfer using parallel streams.« less
Learning in Neural Networks: VLSI Implementation Strategies
NASA Technical Reports Server (NTRS)
Duong, Tuan Anh
1995-01-01
Fully-parallel hardware neural network implementations may be applied to high-speed recognition, classification, and mapping tasks in areas such as vision, or can be used as low-cost self-contained units for tasks such as error detection in mechanical systems (e.g. autos). Learning is required not only to satisfy application requirements, but also to overcome hardware-imposed limitations such as reduced dynamic range of connections.
The Software Correlator of the Chinese VLBI Network
NASA Technical Reports Server (NTRS)
Zheng, Weimin; Quan, Ying; Shu, Fengchun; Chen, Zhong; Chen, Shanshan; Wang, Weihua; Wang, Guangli
2010-01-01
The software correlator of the Chinese VLBI Network (CVN) has played an irreplaceable role in the CVN routine data processing, e.g., in the Chinese lunar exploration project. This correlator will be upgraded to process geodetic and astronomical observation data. In the future, with several new stations joining the network, CVN will carry out crustal movement observations, quick UT1 measurements, astrophysical observations, and deep space exploration activities. For the geodetic or astronomical observations, we need a wide-band 10-station correlator. For spacecraft tracking, a realtime and highly reliable correlator is essential. To meet the scientific and navigation requirements of CVN, two parallel software correlators in the multiprocessor environments are under development. A high speed, 10-station prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm on a computer cluster platform is being developed. Another real-time software correlator for spacecraft tracking adopts the thread-parallel technology, and it runs on the SMP (Symmetric Multiple Processor) servers. Both correlators have the characteristic of flexible structure and scalability.
High-performance parallel interface to synchronous optical network gateway
St. John, W.B.; DuBois, D.H.
1996-12-03
Disclosed is a system of sending and receiving gateways interconnects high speed data interfaces, e.g., HIPPI interfaces, through fiber optic links, e.g., a SONET network. An electronic stripe distributor distributes bytes of data from a first interface at the sending gateway onto parallel fiber optics of the fiber optic link to form transmitted data. An electronic stripe collector receives the transmitted data on the parallel fiber optics and reforms the data into a format effective for input to a second interface at the receiving gateway. Preferably, an error correcting syndrome is constructed at the sending gateway and sent with a data frame so that transmission errors can be detected and corrected in a real-time basis. Since the high speed data interface operates faster than any of the fiber optic links the transmission rate must be adapted to match the available number of fiber optic links so the sending and receiving gateways monitor the availability of fiber links and adjust the data throughput accordingly. In another aspect, the receiving gateway must have sufficient available buffer capacity to accept an incoming data frame. A credit-based flow control system provides for continuously updating the sending gateway on the available buffer capacity at the receiving gateway. 7 figs.
Photonics and other approaches to high speed communications
NASA Technical Reports Server (NTRS)
Maly, Kurt
1992-01-01
Our research group of 4 faculty and about 10-15 graduate students was actively involved (as a group) in the development of computer communication networks for the last five years. Many of its individuals have been involved in related research for a much longer period. The overall research goal is to extend network performance to higher data rates, to improve protocol performance at most ISO layers and to improve network operational performance. We briefly state our research goals, then discuss the research accomplishments and direct your attention to attached and/or published papers which cover the following topics: scalable parallel communications; high performance interconnection between high data rate networks; and a simple, effective media access protocol system for integrated, high data rate networks.
Global interrupt and barrier networks
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.
2008-10-28
A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
Parallelization and automatic data distribution for nuclear reactor simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation.
1986-03-01
the proposed approaches 16, 16, 40 . 451. The conclusion most often reached is that the best scheme to use in a particular design depends highly upon...76. 40 . Siegel, H. J., McMillen. R. J., and Mueller. P. T.. Jr. A survey of interconnection methods for reconligurable parallel processing systems...addressing meehaanm distributed in the network area rimonication% tit reach gigabit./second speeds je g.. PoCoS83 .’ i.V--i the lirO! lk i nitronment is
Processing large remote sensing image data sets on Beowulf clusters
Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail
2003-01-01
High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
NASA Astrophysics Data System (ADS)
Hartmann, Alfred; Redfield, Steve
1989-04-01
This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning
Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation. PMID:26681933
MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning.
Liu, Yang; Yang, Jie; Huang, Yuan; Xu, Lixiong; Li, Siguang; Qi, Man
2015-01-01
Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.
Latency Hiding in Dynamic Partitioning and Load Balancing of Grid Computing Applications
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak
2001-01-01
The Information Power Grid (IPG) concept developed by NASA is aimed to provide a metacomputing platform for large-scale distributed computations, by hiding the intricacies of highly heterogeneous environment and yet maintaining adequate security. In this paper, we propose a latency-tolerant partitioning scheme that dynamically balances processor workloads on the.IPG, and minimizes data movement and runtime communication. By simulating an unsteady adaptive mesh application on a wide area network, we study the performance of our load balancer under the Globus environment. The number of IPG nodes, the number of processors per node, and the interconnected speeds are parameterized to derive conditions under which the IPG would be suitable for parallel distributed processing of such applications. Experimental results demonstrate that effective solution are achieved when the IPG nodes are connected by a high-speed asynchronous interconnection network.
Applications of High Speed Networks
1991-09-01
plished in order to achieve a dpgree of parallelism by constructing a distributed switch. The type of switch, self -routing, processes the packet...control more than a dozen missiles in flight, and the four Mark 99 target illuminators direct missiles in the terminal phase. The self -contained Phalanx...military installations, weapon system respose and expected missile performance against a threat. Projects are already underway transposing of
Playback system designed for X-Band SAR
NASA Astrophysics Data System (ADS)
Yuquan, Liu; Changyong, Dou
2014-03-01
SAR(Synthetic Aperture Radar) has extensive application because it is daylight and weather independent. In particular, X-Band SAR strip map, designed by Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, provides high ground resolution images, at the same time it has a large spatial coverage and a short acquisition time, so it is promising in multi-applications. When sudden disaster comes, the emergency situation acquires radar signal data and image as soon as possible, in order to take action to reduce loss and save lives in the first time. This paper summarizes a type of X-Band SAR playback processing system designed for disaster response and scientific needs. It describes SAR data workflow includes the payload data transmission and reception process. Playback processing system completes signal analysis on the original data, providing SAR level 0 products and quick image. Gigabit network promises radar signal transmission efficiency from recorder to calculation unit. Multi-thread parallel computing and ping pong operation can ensure computation speed. Through gigabit network, multi-thread parallel computing and ping pong operation, high speed data transmission and processing meet the SAR radar data playback real time requirement.
Design consideration in constructing high performance embedded Knowledge-Based Systems (KBS)
NASA Technical Reports Server (NTRS)
Dalton, Shelly D.; Daley, Philip C.
1988-01-01
As the hardware trends for artificial intelligence (AI) involve more and more complexity, the process of optimizing the computer system design for a particular problem will also increase in complexity. Space applications of knowledge based systems (KBS) will often require an ability to perform both numerically intensive vector computations and real time symbolic computations. Although parallel machines can theoretically achieve the speeds necessary for most of these problems, if the application itself is not highly parallel, the machine's power cannot be utilized. A scheme is presented which will provide the computer systems engineer with a tool for analyzing machines with various configurations of array, symbolic, scaler, and multiprocessors. High speed networks and interconnections make customized, distributed, intelligent systems feasible for the application of AI in space. The method presented can be used to optimize such AI system configurations and to make comparisons between existing computer systems. It is an open question whether or not, for a given mission requirement, a suitable computer system design can be constructed for any amount of money.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doerfler, Douglas; Austin, Brian; Cook, Brandon
There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
Optical computing and image processing using photorefractive gallium arsenide
NASA Technical Reports Server (NTRS)
Cheng, Li-Jen; Liu, Duncan T. H.
1990-01-01
Recent experimental results on matrix-vector multiplication and multiple four-wave mixing using GaAs are presented. Attention is given to a simple concept of using two overlapping holograms in GaAs to do two matrix-vector multiplication processes operating in parallel with a common input vector. This concept can be used to construct high-speed, high-capacity, reconfigurable interconnection and multiplexing modules, important for optical computing and neural-network applications.
NASA Astrophysics Data System (ADS)
Ford, Eric B.; Dindar, Saleh; Peters, Jorg
2015-08-01
The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
NASA Astrophysics Data System (ADS)
Higashino, Satoru; Kobayashi, Shoei; Yamagami, Tamotsu
2007-06-01
High data transfer rate has been demanded for data storage devices along increasing the storage capacity. In order to increase the transfer rate, high-speed data processing techniques in read-channel devices are required. Generally, parallel architecture is utilized for the high-speed digital processing. We have developed a new architecture of Interpolated Timing Recovery (ITR) to achieve high-speed data transfer rate and wide capture-range in read-channel devices for the information storage channels. It facilitates the parallel implementation on large-scale-integration (LSI) devices.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
A Stochastic Spiking Neural Network for Virtual Screening.
Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L
2018-04-01
Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.
A real-time hybrid neuron network for highly parallel cognitive systems.
Christiaanse, Gerrit Jan; Zjajo, Amir; Galuzzi, Carlo; van Leuken, Rene
2016-08-01
For comprehensive understanding of how neurons communicate with each other, new tools need to be developed that can accurately mimic the behaviour of such neurons and neuron networks under `real-time' constraints. In this paper, we propose an easily customisable, highly pipelined, neuron network design, which executes optimally scheduled floating-point operations for maximal amount of biophysically plausible neurons per FPGA family type. To reduce the required amount of resources without adverse effect on the calculation latency, a single exponent instance is used for multiple neuron calculation operations. Experimental results indicate that the proposed network design allows the simulation of up to 1188 neurons on Virtex7 (XC7VX550T) device in brain real-time yielding a speed-up of x12.4 compared to the state-of-the art.
A comparison of high-speed links, their commercial support and ongoing R&D activities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzalez, H.L.; Barsotti, E.; Zimmermann, S.
Technological advances and a demanding market have forced the development of higher bandwidth communication standards for networks, data links and busses. Most of these emerging standards are gathering enough momentum that their widespread availability and lower prices are anticipated. The hardware and software that support the physical media for most of these links is currently available, allowing the user community to implement fairly high-bandwidth data links and networks with commercial components. Also, switches needed to support these networks are available or being developed. The commercial suppose of high-bandwidth data links, networks and switching fabrics provides a powerful base for themore » implementation of high-bandwidth data acquisition systems. A large data acquisition system like the one for the Solenoidal Detector Collaboration (SDC) at the SSC can benefit from links and networks that support an integrated systems engineering approach, for initialization, downloading, diagnostics, monitoring, hardware integration and event data readout. The issue that our current work addresses is the possibility of having a channel/network that satisfies the requirements of an integrated data acquisition system. In this paper we present a brief description of high-speed communication links and protocols that we consider of interest for high energy physic High Performance Parallel Interface (HIPPI). Serial HIPPI, Fibre Channel (FC) and Scalable Coherent Interface (SCI). In addition, the initial work required to implement an SDC-like data acquisition system is described.« less
A comparison of high-speed links, their commercial support and ongoing R D activities
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gonzalez, H.L.; Barsotti, E.; Zimmermann, S.
Technological advances and a demanding market have forced the development of higher bandwidth communication standards for networks, data links and busses. Most of these emerging standards are gathering enough momentum that their widespread availability and lower prices are anticipated. The hardware and software that support the physical media for most of these links is currently available, allowing the user community to implement fairly high-bandwidth data links and networks with commercial components. Also, switches needed to support these networks are available or being developed. The commercial suppose of high-bandwidth data links, networks and switching fabrics provides a powerful base for themore » implementation of high-bandwidth data acquisition systems. A large data acquisition system like the one for the Solenoidal Detector Collaboration (SDC) at the SSC can benefit from links and networks that support an integrated systems engineering approach, for initialization, downloading, diagnostics, monitoring, hardware integration and event data readout. The issue that our current work addresses is the possibility of having a channel/network that satisfies the requirements of an integrated data acquisition system. In this paper we present a brief description of high-speed communication links and protocols that we consider of interest for high energy physic High Performance Parallel Interface (HIPPI). Serial HIPPI, Fibre Channel (FC) and Scalable Coherent Interface (SCI). In addition, the initial work required to implement an SDC-like data acquisition system is described.« less
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Cheung, Kit; Schultz, Simon R; Luk, Wayne
2015-01-01
NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
Cheung, Kit; Schultz, Simon R.; Luk, Wayne
2016-01-01
NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542
Reconfiguration and Search of Social Networks
Zhang, Lianming; Peng, Aoyuan
2013-01-01
Social networks tend to exhibit some topological characteristics different from regular networks and random networks, such as shorter average path length and higher clustering coefficient, and the node degree of the majority of social networks obeys exponential distribution. Based on the topological characteristics of the real social networks, a new network model which suits to portray the structure of social networks was proposed, and the characteristic parameters of the model were calculated. To find out the relationship between two people in the social network, and using the local information of the social network and the parallel mechanism, a hybrid search strategy based on k-walker random and a high degree was proposed. Simulation results show that the strategy can significantly reduce the average number of search steps, so as to effectively improve the search speed and efficiency. PMID:24574861
A Decade of Neural Networks: Practical Applications and Prospects
NASA Technical Reports Server (NTRS)
Kemeny, Sabrina E.
1994-01-01
The Jet Propulsion Laboratory Neural Network Workshop, sponsored by NASA and DOD, brings together sponsoring agencies, active researchers, and the user community to formulate a vision for the next decade of neural network research and application prospects. While the speed and computing power of microprocessors continue to grow at an ever-increasing pace, the demand to intelligently and adaptively deal with the complex, fuzzy, and often ill-defined world around us remains to a large extent unaddressed. Powerful, highly parallel computing paradigms such as neural networks promise to have a major impact in addressing these needs. Papers in the workshop proceedings highlight benefits of neural networks in real-world applications compared to conventional computing techniques. Topics include fault diagnosis, pattern recognition, and multiparameter optimization.
Particle simulation on heterogeneous distributed supercomputers
NASA Technical Reports Server (NTRS)
Becker, Jeffrey C.; Dagum, Leonardo
1993-01-01
We describe the implementation and performance of a three dimensional particle simulation distributed between a Thinking Machines CM-2 and a Cray Y-MP. These are connected by a combination of two high-speed networks: a high-performance parallel interface (HIPPI) and an optical network (UltraNet). This is the first application to use this configuration at NASA Ames Research Center. We describe our experience implementing and using the application and report the results of several timing measurements. We show that the distribution of applications across disparate supercomputing platforms is feasible and has reasonable performance. In addition, several practical aspects of the computing environment are discussed.
Efficient Use of Distributed Systems for Scientific Applications
NASA Technical Reports Server (NTRS)
Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques
2000-01-01
Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring distributed systems. In particular this application, illustrated in the document entails an integration of finite element and fluid dynamic simulations to address the cooling of turbine blades of a gas turbine engine design. It is not uncommon to encounter high-temperature, film-cooled turbine airfoils with 1,000,000s of degrees of freedom. This results because of the complexity of the various components of the airfoils, requiring fine-grain meshing for accuracy. Additional information is contained in the original.
Nose, Atsushi; Yamazaki, Tomohiro; Katayama, Hironobu; Uehara, Shuji; Kobayashi, Masatsugu; Shida, Sayaka; Odahara, Masaki; Takamiya, Kenichi; Matsumoto, Shizunori; Miyashita, Leo; Watanabe, Yoshihiro; Izawa, Takashi; Muramatsu, Yoshinori; Nitta, Yoshikazu; Ishikawa, Masatoshi
2018-04-24
We have developed a high-speed vision chip using 3D stacking technology to address the increasing demand for high-speed vision chips in diverse applications. The chip comprises a 1/3.2-inch, 1.27 Mpixel, 500 fps (0.31 Mpixel, 1000 fps, 2 × 2 binning) vision chip with 3D-stacked column-parallel Analog-to-Digital Converters (ADCs) and 140 Giga Operation per Second (GOPS) programmable Single Instruction Multiple Data (SIMD) column-parallel PEs for new sensing applications. The 3D-stacked structure and column parallel processing architecture achieve high sensitivity, high resolution, and high-accuracy object positioning.
Smart photonic networks and computer security for image data
NASA Astrophysics Data System (ADS)
Campello, Jorge; Gill, John T.; Morf, Martin; Flynn, Michael J.
1998-02-01
Work reported here is part of a larger project on 'Smart Photonic Networks and Computer Security for Image Data', studying the interactions of coding and security, switching architecture simulations, and basic technologies. Coding and security: coding methods that are appropriate for data security in data fusion networks were investigated. These networks have several characteristics that distinguish them form other currently employed networks, such as Ethernet LANs or the Internet. The most significant characteristics are very high maximum data rates; predominance of image data; narrowcasting - transmission of data form one source to a designated set of receivers; data fusion - combining related data from several sources; simple sensor nodes with limited buffering. These characteristics affect both the lower level network design and the higher level coding methods.Data security encompasses privacy, integrity, reliability, and availability. Privacy, integrity, and reliability can be provided through encryption and coding for error detection and correction. Availability is primarily a network issue; network nodes must be protected against failure or routed around in the case of failure. One of the more promising techniques is the use of 'secret sharing'. We consider this method as a special case of our new space-time code diversity based algorithms for secure communication. These algorithms enable us to exploit parallelism and scalable multiplexing schemes to build photonic network architectures. A number of very high-speed switching and routing architectures and their relationships with very high performance processor architectures were studied. Indications are that routers for very high speed photonic networks can be designed using the very robust and distributed TCP/IP protocol, if suitable processor architecture support is available.
A solution to neural field equations by a recurrent neural network method
NASA Astrophysics Data System (ADS)
Alharbi, Abir
2012-09-01
Neural field equations (NFE) are used to model the activity of neurons in the brain, it is introduced from a single neuron 'integrate-and-fire model' starting point. The neural continuum is spatially discretized for numerical studies, and the governing equations are modeled as a system of ordinary differential equations. In this article the recurrent neural network approach is used to solve this system of ODEs. This consists of a technique developed by combining the standard numerical method of finite-differences with the Hopfield neural network. The architecture of the net, energy function, updating equations, and algorithms are developed for the NFE model. A Hopfield Neural Network is then designed to minimize the energy function modeling the NFE. Results obtained from the Hopfield-finite-differences net show excellent performance in terms of accuracy and speed. The parallelism nature of the Hopfield approaches may make them easier to implement on fast parallel computers and give them the speed advantage over the traditional methods.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
NASA Technical Reports Server (NTRS)
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1983-01-01
A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Radio Synthesis Imaging - A High Performance Computing and Communications Project
NASA Astrophysics Data System (ADS)
Crutcher, Richard M.
The National Science Foundation has funded a five-year High Performance Computing and Communications project at the National Center for Supercomputing Applications (NCSA) for the direct implementation of several of the computing recommendations of the Astronomy and Astrophysics Survey Committee (the "Bahcall report"). This paper is a summary of the project goals and a progress report. The project will implement a prototype of the next generation of astronomical telescope systems - remotely located telescopes connected by high-speed networks to very high performance, scalable architecture computers and on-line data archives, which are accessed by astronomers over Gbit/sec networks. Specifically, a data link has been installed between the BIMA millimeter-wave synthesis array at Hat Creek, California and NCSA at Urbana, Illinois for real-time transmission of data to NCSA. Data are automatically archived, and may be browsed and retrieved by astronomers using the NCSA Mosaic software. In addition, an on-line digital library of processed images will be established. BIMA data will be processed on a very high performance distributed computing system, with I/O, user interface, and most of the software system running on the NCSA Convex C3880 supercomputer or Silicon Graphics Onyx workstations connected by HiPPI to the high performance, massively parallel Thinking Machines Corporation CM-5. The very computationally intensive algorithms for calibration and imaging of radio synthesis array observations will be optimized for the CM-5 and new algorithms which utilize the massively parallel architecture will be developed. Code running simultaneously on the distributed computers will communicate using the Data Transport Mechanism developed by NCSA. The project will also use the BLANCA Gbit/s testbed network between Urbana and Madison, Wisconsin to connect an Onyx workstation in the University of Wisconsin Astronomy Department to the NCSA CM-5, for development of long-distance distributed computing. Finally, the project is developing 2D and 3D visualization software as part of the international AIPS++ project. This research and development project is being carried out by a team of experts in radio astronomy, algorithm development for massively parallel architectures, high-speed networking, database management, and Thinking Machines Corporation personnel. The development of this complete software, distributed computing, and data archive and library solution to the radio astronomy computing problem will advance our expertise in high performance computing and communications technology and the application of these techniques to astronomical data processing.
A high performance parallel computing architecture for robust image features
NASA Astrophysics Data System (ADS)
Zhou, Renyan; Liu, Leibo; Wei, Shaojun
2014-03-01
A design of parallel architecture for image feature detection and description is proposed in this article. The major component of this architecture is a 2D cellular network composed of simple reprogrammable processors, enabling the Hessian Blob Detector and Haar Response Calculation, which are the most computing-intensive stage of the Speeded Up Robust Features (SURF) algorithm. Combining this 2D cellular network and dedicated hardware for SURF descriptors, this architecture achieves real-time image feature detection with minimal software in the host processor. A prototype FPGA implementation of the proposed architecture achieves 1318.9 GOPS general pixel processing @ 100 MHz clock and achieves up to 118 fps in VGA (640 × 480) image feature detection. The proposed architecture is stand-alone and scalable so it is easy to be migrated into VLSI implementation.
Deep Neural Network Emulation of a High-Order, WENO-Limited, Space-Time Reconstruction
NASA Astrophysics Data System (ADS)
Norman, M. R.; Hall, D. M.
2017-12-01
Deep Neural Networks (DNNs) have been used to emulate a number of processes in atmospheric models, including radiation and even so-called super-parameterization of moist convection. In each scenario, the DNN provides a good representation of the process even for inputs that have not been encountered before. More notably, they provide an emulation at a fraction of the cost of the original routine, giving speed-ups of 30× and even up to 200× compared to the runtime costs of the original routines. However, to our knowledge there has not been an investigation into using DNNs to emulate the dynamics. The most likely reason for this is that dynamics operators are typically both linear and low cost, meaning they cannot be sped up by a non-linear DNN emulation. However, there exist high-cost non-linear space-time dynamics operators that significantly reduce the number of parallel data transfers necessary to complete an atmospheric simulation. The WENO-limited Finite-Volume method with ADER-DT time integration is a prime example of this - needing only two parallel communications per large, fully limited time step. However, it comes at a high cost in terms of computation, which is why many would hesitate to use it. This talk investigates DNN emulation of the WENO-limited space-time finite-volume reconstruction procedure - the most expensive portion of this method, which densely clusters a large amount of non-linear computation. Different training techniques and network architectures are tested, and the accuracy and speed-up of each is given.
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J.C.H.; Lung, H.; Katsumata, Y.
1995-12-01
In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
Extended Logic Intelligent Processing System for a Sensor Fusion Processor Hardware
NASA Technical Reports Server (NTRS)
Stoica, Adrian; Thomas, Tyson; Li, Wei-Te; Daud, Taher; Fabunmi, James
2000-01-01
The paper presents the hardware implementation and initial tests from a low-power, highspeed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) is described, which combines rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor signals in compact low power VLSI. The development of the ELIPS concept is being done to demonstrate the interceptor functionality which particularly underlines the high speed and low power requirements. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Processing speeds of microseconds have been demonstrated using our test hardware.
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.
Slażyński, Leszek; Bohte, Sander
2012-01-01
The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
High-Density Liquid-State Machine Circuitry for Time-Series Forecasting.
Rosselló, Josep L; Alomar, Miquel L; Morro, Antoni; Oliver, Antoni; Canals, Vincent
2016-08-01
Spiking neural networks (SNN) are the last neural network generation that try to mimic the real behavior of biological neurons. Although most research in this area is done through software applications, it is in hardware implementations in which the intrinsic parallelism of these computing systems are more efficiently exploited. Liquid state machines (LSM) have arisen as a strategic technique to implement recurrent designs of SNN with a simple learning methodology. In this work, we show a new low-cost methodology to implement high-density LSM by using Boolean gates. The proposed method is based on the use of probabilistic computing concepts to reduce hardware requirements, thus considerably increasing the neuron count per chip. The result is a highly functional system that is applied to high-speed time series forecasting.
Long-range interactions and parallel scalability in molecular simulations
NASA Astrophysics Data System (ADS)
Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko
2007-01-01
Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.
High-speed and high-fidelity system and method for collecting network traffic
Weigle, Eric H [Los Alamos, NM
2010-08-24
A system is provided for the high-speed and high-fidelity collection of network traffic. The system can collect traffic at gigabit-per-second (Gbps) speeds, scale to terabit-per-second (Tbps) speeds, and support additional functions such as real-time network intrusion detection. The present system uses a dedicated operating system for traffic collection to maximize efficiency, scalability, and performance. A scalable infrastructure and apparatus for the present system is provided by splitting the work performed on one host onto multiple hosts. The present system simultaneously addresses the issues of scalability, performance, cost, and adaptability with respect to network monitoring, collection, and other network tasks. In addition to high-speed and high-fidelity network collection, the present system provides a flexible infrastructure to perform virtually any function at high speeds such as real-time network intrusion detection and wide-area network emulation for research purposes.
2005-01-01
Eppler , or Selig airfoil [147, 148] to be used. Other high lift wings could be used such as the low Reynolds number NASA LRN-I-1010 airfoil used in...Fraser, Airfoils at Low Speeds. Virginia Beach, VA: H.A. Stokely, 1989. [148] R. Eppler , Airfoil Design and Data. Berlin, Germany: Springer-Verlag... 61 Figure 3-4: RMS Error for CMAC Approximation (L=3) .......................................... 61 Figure 3-5: CMAC
Neuromorphic Hardware Architecture Using the Neural Engineering Framework for Pattern Recognition.
Wang, Runchun; Thakur, Chetan Singh; Cohen, Gregory; Hamilton, Tara Julia; Tapson, Jonathan; van Schaik, Andre
2017-06-01
We present a hardware architecture that uses the neural engineering framework (NEF) to implement large-scale neural networks on field programmable gate arrays (FPGAs) for performing massively parallel real-time pattern recognition. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks and we have previously presented an FPGA implementation of the NEF that successfully performs nonlinear mathematical computations. That work was developed based on a compact digital neural core, which consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. We have now scaled this approach up to build a pattern recognition system by combining identical neural cores together. As a proof of concept, we have developed a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture and hardware optimisations presented offer high-speed and resource-efficient means for performing high-speed, neuromorphic, and massively parallel pattern recognition and classification tasks.
Reverse engineering a gene network using an asynchronous parallel evolution strategy
2010-01-01
Background The use of reverse engineering methods to infer gene regulatory networks by fitting mathematical models to gene expression data is becoming increasingly popular and successful. However, increasing model complexity means that more powerful global optimisation techniques are required for model fitting. The parallel Lam Simulated Annealing (pLSA) algorithm has been used in such approaches, but recent research has shown that island Evolutionary Strategies can produce faster, more reliable results. However, no parallel island Evolutionary Strategy (piES) has yet been demonstrated to be effective for this task. Results Here, we present synchronous and asynchronous versions of the piES algorithm, and apply them to a real reverse engineering problem: inferring parameters in the gap gene network. We find that the asynchronous piES exhibits very little communication overhead, and shows significant speed-up for up to 50 nodes: the piES running on 50 nodes is nearly 10 times faster than the best serial algorithm. We compare the asynchronous piES to pLSA on the same test problem, measuring the time required to reach particular levels of residual error, and show that it shows much faster convergence than pLSA across all optimisation conditions tested. Conclusions Our results demonstrate that the piES is consistently faster and more reliable than the pLSA algorithm on this problem, and scales better with increasing numbers of nodes. In addition, the piES is especially well suited to further improvements and adaptations: Firstly, the algorithm's fast initial descent speed and high reliability make it a good candidate for being used as part of a global/local search hybrid algorithm. Secondly, it has the potential to be used as part of a hierarchical evolutionary algorithm, which takes advantage of modern multi-core computing architectures. PMID:20196855
Parallel Guessing: A Strategy for High-Speed Computation
1984-09-19
for using additional hardware to obtain higher processing speed). In this paper we argue that parallel guessing for image analysis is a useful...from a true solution, or the correctness of a guess, can be readily checked. We review image - analysis algorithms having a parallel guessing or
Maleki, Ehsan; Babashah, Hossein; Koohi, Somayyeh; Kavehvash, Zahra
2017-07-01
This paper presents an optical processing approach for exploring a large number of genome sequences. Specifically, we propose an optical correlator for global alignment and an extended moiré matching technique for local analysis of spatially coded DNA, whose output is fed to a novel three-dimensional artificial neural network for local DNA alignment. All-optical implementation of the proposed 3D artificial neural network is developed and its accuracy is verified in Zemax. Thanks to its parallel processing capability, the proposed structure performs local alignment of 4 million sequences of 150 base pairs in a few seconds, which is much faster than its electrical counterparts, such as the basic local alignment search tool.
Memory and neural networks on the basis of color centers in solids.
Winnacker, Albrecht; Osvet, Andres
2009-11-01
Optical data recording is one of the most widely used and efficient systems of memory in the non-living world. The application of color centers in this context offers not only systems of high speed in writing and read-out due to a high degree of parallelism in data handling but also a possibility to set up models of neural networks. In this way, systems with a high potential for image processing, pattern recognition and logical operations can be constructed. A limitation to storage density is given by the diffraction limit of optical data recording. It is shown that this limitation can at least in principle be overcome by the principle of spectral hole burning, which results in systems of storage capacities close to the human brain system.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
FPGA implementation of a biological neural network based on the Hodgkin-Huxley neuron model.
Yaghini Bonabi, Safa; Asgharian, Hassan; Safari, Saeed; Nili Ahmadabadi, Majid
2014-01-01
A set of techniques for efficient implementation of Hodgkin-Huxley-based (H-H) model of a neural network on FPGA (Field Programmable Gate Array) is presented. The central implementation challenge is H-H model complexity that puts limits on the network size and on the execution speed. However, basics of the original model cannot be compromised when effect of synaptic specifications on the network behavior is the subject of study. To solve the problem, we used computational techniques such as CORDIC (Coordinate Rotation Digital Computer) algorithm and step-by-step integration in the implementation of arithmetic circuits. In addition, we employed different techniques such as sharing resources to preserve the details of model as well as increasing the network size in addition to keeping the network execution speed close to real time while having high precision. Implementation of a two mini-columns network with 120/30 excitatory/inhibitory neurons is provided to investigate the characteristic of our method in practice. The implementation techniques provide an opportunity to construct large FPGA-based network models to investigate the effect of different neurophysiological mechanisms, like voltage-gated channels and synaptic activities, on the behavior of a neural network in an appropriate execution time. Additional to inherent properties of FPGA, like parallelism and re-configurability, our approach makes the FPGA-based system a proper candidate for study on neural control of cognitive robots and systems as well.
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection
Chen, Yaw-Chung
2015-01-01
The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms. PMID:26437335
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.
Lee, Chun-Liang; Lin, Yi-Shan; Chen, Yaw-Chung
2015-01-01
The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
NASA Technical Reports Server (NTRS)
Tobagi, Fouad A.; Dalgic, Ismail; Pang, Joseph
1990-01-01
The design and implementation of interface units for high speed Fiber Optic Local Area Networks and Broadband Integrated Services Digital Networks are discussed. During the last years, a number of network adapters that are designed to support high speed communications have emerged. This approach to the design of a high speed network interface unit was to implement package processing functions in hardware, using VLSI technology. The VLSI hardware implementation of a buffer management unit, which is required in such architectures, is described.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
An Artificial Neural Networks Method for Solving Partial Differential Equations
NASA Astrophysics Data System (ADS)
Alharbi, Abir
2010-09-01
While there already exists many analytical and numerical techniques for solving PDEs, this paper introduces an approach using artificial neural networks. The approach consists of a technique developed by combining the standard numerical method, finite-difference, with the Hopfield neural network. The method is denoted Hopfield-finite-difference (HFD). The architecture of the nets, energy function, updating equations, and algorithms are developed for the method. The HFD method has been used successfully to approximate the solution of classical PDEs, such as the Wave, Heat, Poisson and the Diffusion equations, and on a system of PDEs. The software Matlab is used to obtain the results in both tabular and graphical form. The results are similar in terms of accuracy to those obtained by standard numerical methods. In terms of speed, the parallel nature of the Hopfield nets methods makes them easier to implement on fast parallel computers while some numerical methods need extra effort for parallelization.
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Applications of Emerging Parallel Optical Link Technology to High Energy Physics Experiments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chramowicz, J.; Kwan, S.; Prosser, A.
2011-09-01
Modern particle detectors depend upon optical fiber links to deliver event data to upstream trigger and data processing systems. Future detector systems can benefit from the development of dense arrangements of high speed optical links emerging from the telecommunications and storage area network market segments. These links support data transfers in each direction at rates up to 120 Gbps in packages that minimize or even eliminate edge connector requirements. Emerging products include a class of devices known as optical engines which permit assembly of the optical transceivers in close proximity to the electrical interfaces of ASICs and FPGAs which handlemore » the data in parallel electrical format. Such assemblies will reduce required printed circuit board area and minimize electromagnetic interference and susceptibility. We will present test results of some of these parallel components and report on the development of pluggable FPGA Mezzanine Cards equipped with optical engines to provide to collaborators on the Versatile Link Common Project for the HI-LHC at CERN.« less
Providing a parallel and distributed capability for JMASS using SPEEDES
NASA Astrophysics Data System (ADS)
Valinski, Maria; Driscoll, Jonathan; McGraw, Robert M.; Meyer, Bob
2002-07-01
The Joint Modeling And Simulation System (JMASS) is a Tri-Service simulation environment that supports engineering and engagement-level simulations. As JMASS is expanded to support other Tri-Service domains, the current set of modeling services must be expanded for High Performance Computing (HPC) applications by adding support for advanced time-management algorithms, parallel and distributed topologies, and high speed communications. By providing support for these services, JMASS can better address modeling domains requiring parallel computationally intense calculations such clutter, vulnerability and lethality calculations, and underwater-based scenarios. A risk reduction effort implementing some HPC services for JMASS using the SPEEDES (Synchronous Parallel Environment for Emulation and Discrete Event Simulation) Simulation Framework has recently concluded. As an artifact of the JMASS-SPEEDES integration, not only can HPC functionality be brought to the JMASS program through SPEEDES, but an additional HLA-based capability can be demonstrated that further addresses interoperability issues. The JMASS-SPEEDES integration provided a means of adding HLA capability to preexisting JMASS scenarios through an implementation of the standard JMASS port communication mechanism that allows players to communicate.
NASA Astrophysics Data System (ADS)
Luo, Bingyang; Chi, Shangjie; Fang, Man; Li, Mengchao
2017-03-01
Permanent magnet synchronous motor is used widely in industry, the performance requirements wouldn't be met by adopting traditional PID control in some of the occasions with high requirements. In this paper, a hybrid control strategy - nonlinear neural network PID and traditional PID parallel control are adopted. The high stability and reliability of traditional PID was combined with the strong adaptive ability and robustness of neural network. The permanent magnet synchronous motor will get better control performance when switch different working modes according to different controlled object conditions. As the results showed, the speed response adopting the composite control strategy in this paper was faster than the single control strategy. And in the case of sudden disturbance, the recovery time adopting the composite control strategy designed in this paper was shorter, the recovery ability and the robustness were stronger.
Energy reduction using multi-channels optical wireless communication based OFDM
NASA Astrophysics Data System (ADS)
Darwesh, Laialy; Arnon, Shlomi
2017-10-01
In recent years, an increasing number of data center networks (DCNs) have been built to provide various cloud applications. Major challenges in the design of next generation DC networks include reduction of the energy consumption, high flexibility and scalability, high data rates, minimum latency and high cyber security. Use of optical wireless communication (OWC) to augment the DC network could help to confront some of these challenges. In this paper we present an OWC multi channels communication method that could lead to significant energy reduction of the communication equipment. The method is to convert a high speed serial data stream to many slower and parallel streams and vies versa at the receiver. We implement this concept of multi channels using optical orthogonal frequency division multiplexing (O-OFDM) method. In our scheme, we use asymmetrically clipped optical OFDM (ACO-OFDM). Our results show that the realization of multi channels OFDM (ACO-OFDM) methods reduces the total energy consumption exponentially, as the number of channels transmitted through them rises.
NASA Technical Reports Server (NTRS)
1988-01-01
Final report to NASA LeRC on the development of gallium arsenide (GaAS) high-speed, low power serial/parallel interface modules. The report discusses the development and test of a family of 16, 32 and 64 bit parallel to serial and serial to parallel integrated circuits using a self aligned gate MESFET technology developed at the Honeywell Sensors and Signal Processing Laboratory. Lab testing demonstrated 1.3 GHz clock rates at a power of 300 mW. This work was accomplished under contract number NAS3-24676.
Dynamic performance of high speed solenoid valve with parallel coils
NASA Astrophysics Data System (ADS)
Kong, Xiaowu; Li, Shizhen
2014-07-01
The methods of improving the dynamic performance of high speed on/off solenoid valve include increasing the magnetic force of armature and the slew rate of coil current, decreasing the mass and stroke of moving parts. The increase of magnetic force usually leads to the decrease of current slew rate, which could increase the delay time of the dynamic response of solenoid valve. Using a high voltage to drive coil can solve this contradiction, but a high driving voltage can also lead to more cost and a decrease of safety and reliability. In this paper, a new scheme of parallel coils is investigated, in which the single coil of solenoid is replaced by parallel coils with same ampere turns. Based on the mathematic model of high speed solenoid valve, the theoretical formula for the delay time of solenoid valve is deduced. Both the theoretical analysis and the dynamic simulation show that the effect of dividing a single coil into N parallel sub-coils is close to that of driving the single coil with N times of the original driving voltage as far as the delay time of solenoid valve is concerned. A specific test bench is designed to measure the dynamic performance of high speed on/off solenoid valve. The experimental results also prove that both the delay time and switching time of the solenoid valves can be decreased greatly by adopting the parallel coil scheme. This research presents a simple and practical method to improve the dynamic performance of high speed on/off solenoid valve.
Neural network for control of rearrangeable Clos networks.
Park, Y K; Cherkassky, V
1994-09-01
Rapid evolution in the field of communication networks requires high speed switching technologies. This involves a high degree of parallelism in switching control and routing performed at the hardware level. The multistage crossbar networks have always been attractive to switch designers. In this paper a neural network approach to controlling a three-stage Clos network in real time is proposed. This controller provides optimal routing of communication traffic requests on a call-by-call basis by rearranging existing connections, with a minimum length of rearrangement sequence so that a new blocked call request can be accommodated. The proposed neural network controller uses Paull's rearrangement algorithm, along with the special (least used) switch selection rule in order to minimize the length of rearrangement sequences. The functional behavior of our model is verified by simulations and it is shown that the convergence time required for finding an optimal solution is constant, regardless of the switching network size. The performance is evaluated for random traffic with various traffic loads. Simulation results show that applying the least used switch selection rule increases the efficiency in switch rearrangements, reducing the network convergence time. The implementation aspects are also discussed to show the feasibility of the proposed approach.
Graphics Processors in HEP Low-Level Trigger Systems
NASA Astrophysics Data System (ADS)
Ammendola, Roberto; Biagioni, Andrea; Chiozzi, Stefano; Cotta Ramusino, Angelo; Cretaro, Paolo; Di Lorenzo, Stefano; Fantechi, Riccardo; Fiorini, Massimiliano; Frezza, Ottorino; Lamanna, Gianluca; Lo Cicero, Francesca; Lonardo, Alessandro; Martinelli, Michele; Neri, Ilaria; Paolucci, Pier Stanislao; Pastorelli, Elena; Piandani, Roberto; Pontisso, Luca; Rossetti, Davide; Simula, Francesco; Sozzi, Marco; Vicini, Piero
2016-11-01
Usage of Graphics Processing Units (GPUs) in the so called general-purpose computing is emerging as an effective approach in several fields of science, although so far applications have been employing GPUs typically for offline computations. Taking into account the steady performance increase of GPU architectures in terms of computing power and I/O capacity, the real-time applications of these devices can thrive in high-energy physics data acquisition and trigger systems. We will examine the use of online parallel computing on GPUs for the synchronous low-level trigger, focusing on tests performed on the trigger system of the CERN NA62 experiment. To successfully integrate GPUs in such an online environment, latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Furthermore, it is assessed how specific trigger algorithms can be parallelized and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen Large Hadron Collider (LHC) luminosity upgrade where highly selective algorithms will be essential to maintain sustainable trigger rates with very high pileup.
2005-09-01
This research explores the need for a high throughput, high speed network for use in a network centric wartime environment and how commercial...Automated Digital Network System (ADNS). This research explores the need for a high-throughput, high-speed network for use in a network centric ...1 C. DEPARTMENT OF DEFENSE (DOD) DESIRED END STATE ..............2 1. DOD Transformation to Network Centric Warfare (NCW) Operations
Optimal forwarding ratio on dynamical networks with heterogeneous mobility
NASA Astrophysics Data System (ADS)
Gan, Yu; Tang, Ming; Yang, Hanxin
2013-05-01
Since the discovery of non-Poisson statistics of human mobility trajectories, more attention has been paid to understand the role of these patterns in different dynamics. In this study, we first introduce the heterogeneous mobility of mobile agents into dynamical networks, and then investigate packet forwarding strategy on the heterogeneous dynamical networks. We find that the faster speed and the higher proportion of high-speed agents can enhance the network throughput and reduce the mean traveling time in random forwarding. A hierarchical structure in the dependence of high-speed is observed: the network throughput remains unchanged at small and large high-speed value. It is also interesting to find that a slightly preferential forwarding to high-speed agents can maximize the network capacity. Through theoretical analysis and numerical simulations, we show that the optimal forwarding ratio stems from the local structural heterogeneity of low-speed agents.
FPGA implementation of a biological neural network based on the Hodgkin-Huxley neuron model
Yaghini Bonabi, Safa; Asgharian, Hassan; Safari, Saeed; Nili Ahmadabadi, Majid
2014-01-01
A set of techniques for efficient implementation of Hodgkin-Huxley-based (H-H) model of a neural network on FPGA (Field Programmable Gate Array) is presented. The central implementation challenge is H-H model complexity that puts limits on the network size and on the execution speed. However, basics of the original model cannot be compromised when effect of synaptic specifications on the network behavior is the subject of study. To solve the problem, we used computational techniques such as CORDIC (Coordinate Rotation Digital Computer) algorithm and step-by-step integration in the implementation of arithmetic circuits. In addition, we employed different techniques such as sharing resources to preserve the details of model as well as increasing the network size in addition to keeping the network execution speed close to real time while having high precision. Implementation of a two mini-columns network with 120/30 excitatory/inhibitory neurons is provided to investigate the characteristic of our method in practice. The implementation techniques provide an opportunity to construct large FPGA-based network models to investigate the effect of different neurophysiological mechanisms, like voltage-gated channels and synaptic activities, on the behavior of a neural network in an appropriate execution time. Additional to inherent properties of FPGA, like parallelism and re-configurability, our approach makes the FPGA-based system a proper candidate for study on neural control of cognitive robots and systems as well. PMID:25484854
Portable parallel stochastic optimization for the design of aeropropulsion components
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Rhodes, G. S.
1994-01-01
This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Digital intermediate frequency QAM modulator using parallel processing
Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA
2008-05-27
The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.
High speed parallel spectral-domain OCT using spectrally encoded line-field illumination
NASA Astrophysics Data System (ADS)
Lee, Kye-Sung; Hur, Hwan; Bae, Ji Yong; Kim, I. Jong; Kim, Dong Uk; Nam, Ki-Hwan; Kim, Geon-Hee; Chang, Ki Soo
2018-01-01
We report parallel spectral-domain optical coherence tomography (OCT) at 500 000 A-scan/s. This is the highest-speed spectral-domain (SD) OCT system using a single line camera. Spectrally encoded line-field scanning is proposed to increase the imaging speed in SD-OCT effectively, and the tradeoff between speed, depth range, and sensitivity is demonstrated. We show that three imaging modes of 125k, 250k, and 500k A-scan/s can be simply switched according to the sample to be imaged considering the depth range and sensitivity. To demonstrate the biological imaging performance of the high-speed imaging modes of the spectrally encoded line-field OCT system, human skin and a whole leaf were imaged at the speed of 250k and 500k A-scan/s, respectively. In addition, there is no sensitivity dependence in the B-scan direction, which is implicit in line-field parallel OCT using line focusing of a Gaussian beam with a cylindrical lens.
NASA Astrophysics Data System (ADS)
Wan, Tat C.; Kabuka, Mansur R.
1994-05-01
With the tremendous growth in imaging applications and the development of filmless radiology, the need for compression techniques that can achieve high compression ratios with user specified distortion rates becomes necessary. Boundaries and edges in the tissue structures are vital for detection of lesions and tumors, which in turn requires the preservation of edges in the image. The proposed edge preserving image compressor (EPIC) combines lossless compression of edges with neural network compression techniques based on dynamic associative neural networks (DANN), to provide high compression ratios with user specified distortion rates in an adaptive compression system well-suited to parallel implementations. Improvements to DANN-based training through the use of a variance classifier for controlling a bank of neural networks speed convergence and allow the use of higher compression ratios for `simple' patterns. The adaptation and generalization capabilities inherent in EPIC also facilitate progressive transmission of images through varying the number of quantization levels used to represent compressed patterns. Average compression ratios of 7.51:1 with an averaged average mean squared error of 0.0147 were achieved.
64 x 64 thresholding photodetector array for optical pattern recognition
NASA Astrophysics Data System (ADS)
Langenbacher, Harry; Chao, Tien-Hsin; Shaw, Timothy; Yu, Jeffrey W.
1993-10-01
A high performance 32 X 32 peak detector array is introduced. This detector consists of a 32 X 32 array of thresholding photo-transistor cells, manufactured with a standard MOSIS digital 2-micron CMOS process. A built-in thresholding function that is able to perform 1024 thresholding operations in parallel strongly distinguishes this chip from available CCD detectors. This high speed detector offers responses from one to 10 milliseconds that is much higher than the commercially available CCD detectors operating at a TV frame rate. The parallel multiple peaks thresholding detection capability makes it particularly suitable for optical correlator and optoelectronically implemented neural networks. The principle of operation, circuit design and the performance characteristics are described. Experimental demonstration of correlation peak detection is also provided. Recently, we have also designed and built an advanced version of a 64 X 64 thresholding photodetector array chip. Experimental investigation of using this chip for pattern recognition is ongoing.
An FPGA-based High Speed Parallel Signal Processing System for Adaptive Optics Testbed
NASA Astrophysics Data System (ADS)
Kim, H.; Choi, Y.; Yang, Y.
In this paper a state-of-the-art FPGA (Field Programmable Gate Array) based high speed parallel signal processing system (SPS) for adaptive optics (AO) testbed with 1 kHz wavefront error (WFE) correction frequency is reported. The AO system consists of Shack-Hartmann sensor (SHS) and deformable mirror (DM), tip-tilt sensor (TTS), tip-tilt mirror (TTM) and an FPGA-based high performance SPS to correct wavefront aberrations. The SHS is composed of 400 subapertures and the DM 277 actuators with Fried geometry, requiring high speed parallel computing capability SPS. In this study, the target WFE correction speed is 1 kHz; therefore, it requires massive parallel computing capabilities as well as strict hard real time constraints on measurements from sensors, matrix computation latency for correction algorithms, and output of control signals for actuators. In order to meet them, an FPGA based real-time SPS with parallel computing capabilities is proposed. In particular, the SPS is made up of a National Instrument's (NI's) real time computer and five FPGA boards based on state-of-the-art Xilinx Kintex 7 FPGA. Programming is done with NI's LabView environment, providing flexibility when applying different algorithms for WFE correction. It also facilitates faster programming and debugging environment as compared to conventional ones. One of the five FPGA's is assigned to measure TTS and calculate control signals for TTM, while the rest four are used to receive SHS signal, calculate slops for each subaperture and correction signal for DM. With this parallel processing capabilities of the SPS the overall closed-loop WFE correction speed of 1 kHz has been achieved. System requirements, architecture and implementation issues are described; furthermore, experimental results are also given.
Ethernet-based test stand for a CAN network
NASA Astrophysics Data System (ADS)
Ziebinski, Adam; Cupek, Rafal; Drewniak, Marek
2017-11-01
This paper presents a test stand for the CAN-based systems that are used in automotive systems. The authors propose applying an Ethernet-based test system that supports the virtualisation of a CAN network. The proposed solution has many advantages compared to classical test beds that are based on dedicated CAN-PC interfaces: it allows the physical constraints associated with the number of interfaces that can be simultaneously connected to a tested system to be avoided, which enables the test time for parallel tests to be shortened; the high speed of Ethernet transmission allows for more frequent sampling of the messages that are transmitted by a CAN network (as the authors show in the experiment results section) and the cost of the proposed solution is much lower than the traditional lab-based dedicated CAN interfaces for PCs.
Evaluation of the power consumption of a high-speed parallel robot
NASA Astrophysics Data System (ADS)
Han, Gang; Xie, Fugui; Liu, Xin-Jun
2018-06-01
An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives
NASA Astrophysics Data System (ADS)
Usha, S.; Subramani, C.
2018-04-01
Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.
Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar
2014-01-01
Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control
NASA Astrophysics Data System (ADS)
Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming
2017-09-01
Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
Trend on High-speed Power Line Communication Technology
NASA Astrophysics Data System (ADS)
Ogawa, Osamu
High-speed power line communication (PLC) is useful technology to easily build the communication networks, because construction of new infrastructure is not necessary. In Europe and America, PLC has been used for broadband networks since the beginning of 21th century. In Japan, high-speed PLC was deregulated only indoor usage in 2006. Afterward it has been widely used for home area network, LAN in hotels and school buildings and so on. And recently, PLC is greatly concerned as communication technology for smart grid network. In this paper, the author surveys the high-speed PLC technology and its current status.
Parallel pulse processing and data acquisition for high speed, low error flow cytometry
van den Engh, Gerrit J.; Stokdijk, Willem
1992-01-01
A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate.
Global tree network for computing structures enabling global processing operations
Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.
2010-01-19
A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic
Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
Binary tree eigen solver in finite element analysis
NASA Technical Reports Server (NTRS)
Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.
1993-01-01
This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
Brian hears: online auditory processing using vectorization over channels.
Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
Speed challenge: a case for hardware implementation in soft-computing
NASA Technical Reports Server (NTRS)
Daud, T.; Stoica, A.; Duong, T.; Keymeulen, D.; Zebulum, R.; Thomas, T.; Thakoor, A.
2000-01-01
For over a decade, JPL has been actively involved in soft computing research on theory, architecture, applications, and electronics hardware. The driving force in all our research activities, in addition to the potential enabling technology promise, has been creation of a niche that imparts orders of magnitude speed advantage by implementation in parallel processing hardware with algorithms made especially suitable for hardware implementation. We review our work on neural networks, fuzzy logic, and evolvable hardware with selected application examples requiring real time response capabilities.
Networked high-speed auroral observations combined with radar measurements for multi-scale insights
NASA Astrophysics Data System (ADS)
Hirsch, M.; Semeter, J. L.
2015-12-01
Networks of ground-based instruments to study terrestrial aurora for the purpose of analyzing particle precipitation characteristics driving the aurora have been established. Additional funding is pouring into future ground-based auroral observation networks consisting of combinations of tossable, portable, and fixed installation ground-based legacy equipment. Our approach to this problem using the High Speed Tomography (HiST) system combines tightly-synchronized filtered auroral optical observations capturing temporal features of order 10 ms with supporting measurements from incoherent scatter radar (ISR). ISR provides a broader spatial context up to order 100 km laterally on one minute time scales, while our camera field of view (FOV) is chosen to be order 10 km at auroral altitudes in order to capture 100 m scale lateral auroral features. The dual-scale observations of ISR and HiST fine-scale optical observations may be coupled through a physical model using linear basis functions to estimate important ionospheric quantities such as electron number density in 3-D (time, perpendicular and parallel to the geomagnetic field).Field measurements and analysis using HiST and PFISR are presented from experiments conducted at the Poker Flat Research Range in central Alaska. Other multiscale configuration candidates include supplementing networks of all-sky cameras such as THEMIS with co-locations of HiST-like instruments to fuse wide FOV measurements with the fine-scale HiST precipitation characteristic estimates. Candidate models for this coupling include GLOW and TRANSCAR. Future extensions of this work may include incorporating line of sight total electron count estimates from ground-based networks of GPS receivers in a sensor fusion problem.
XUNET experimental high-speed network testbed CRADA 1136, DOE TTI No. 92-MULT-020-B2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palmer, R.E.
1996-04-01
XUNET is a research program with AT&T and other partners to study high-speed wide area communication between local area networks over a backbone using Asynchronous Transfer Mode (ATM) switches. Important goals of the project are to develop software techniques for network control and management, and applications for high-speed networks. The project entails building a testbed between member sites to explore performance issues for mixed network traffic such as congestion control, multimedia communications protocols, segmentation and reassembly of ATM cells, and overall data throughput rates.
Parallel pulse processing and data acquisition for high speed, low error flow cytometry
Engh, G.J. van den; Stokdijk, W.
1992-09-22
A digitally synchronized parallel pulse processing and data acquisition system for a flow cytometer has multiple parallel input channels with independent pulse digitization and FIFO storage buffer. A trigger circuit controls the pulse digitization on all channels. After an event has been stored in each FIFO, a bus controller moves the oldest entry from each FIFO buffer onto a common data bus. The trigger circuit generates an ID number for each FIFO entry, which is checked by an error detection circuit. The system has high speed and low error rate. 17 figs.
Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.
Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar
2017-03-01
This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8 × 800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.
High performance computing and communications: Advancing the frontiers of information technology
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
1997-12-31
This report, which supplements the President`s Fiscal Year 1997 Budget, describes the interagency High Performance Computing and Communications (HPCC) Program. The HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of accomplishments to its credit. Over its five-year history, the HPCC Program has focused on developing high performance computing and communications technologies that can be applied to computation-intensive applications. Major highlights for FY 1996: (1) High performance computing systems enable practical solutions to complex problems with accuracies not possible five years ago; (2) HPCC-funded research in very large scale networking techniques has been instrumental inmore » the evolution of the Internet, which continues exponential growth in size, speed, and availability of information; (3) The combination of hardware capability measured in gigaflop/s, networking technology measured in gigabit/s, and new computational science techniques for modeling phenomena has demonstrated that very large scale accurate scientific calculations can be executed across heterogeneous parallel processing systems located thousands of miles apart; (4) Federal investments in HPCC software R and D support researchers who pioneered the development of parallel languages and compilers, high performance mathematical, engineering, and scientific libraries, and software tools--technologies that allow scientists to use powerful parallel systems to focus on Federal agency mission applications; and (5) HPCC support for virtual environments has enabled the development of immersive technologies, where researchers can explore and manipulate multi-dimensional scientific and engineering problems. Educational programs fostered by the HPCC Program have brought into classrooms new science and engineering curricula designed to teach computational science. This document contains a small sample of the significant HPCC Program accomplishments in FY 1996.« less
Networking via wireless bridge produces greater speed and flexibility, lowers cost.
1998-10-01
Wireless computer networking. Computer connectivity is essential in today's high-tech health care industry. But telephone lines aren't fast enough, and high-speed connections like T-1 lines are costly. Read about an Ohio community hospital that installed a wireless network "bridge" to connect buildings that are miles apart, creating a reliable high-speed link that costs one-tenth of a T-1 line.
2004-01-01
ion battery pack sized for endurance speed. The flexible optimization routine allows relative importance to be assigned between the use of gasoline...simulation results are provided. The two-point conceptual design includes an internal combustion engine sized for cruise and an electric motor and lithium
Efficiency of parallel direct optimization
NASA Technical Reports Server (NTRS)
Janies, D. A.; Wheeler, W. C.
2001-01-01
Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
High-speed parallel implementation of a modified PBR algorithm on DSP-based EH topology
NASA Astrophysics Data System (ADS)
Rajan, K.; Patnaik, L. M.; Ramakrishna, J.
1997-08-01
Algebraic Reconstruction Technique (ART) is an age-old method used for solving the problem of three-dimensional (3-D) reconstruction from projections in electron microscopy and radiology. In medical applications, direct 3-D reconstruction is at the forefront of investigation. The simultaneous iterative reconstruction technique (SIRT) is an ART-type algorithm with the potential of generating in a few iterations tomographic images of a quality comparable to that of convolution backprojection (CBP) methods. Pixel-based reconstruction (PBR) is similar to SIRT reconstruction, and it has been shown that PBR algorithms give better quality pictures compared to those produced by SIRT algorithms. In this work, we propose a few modifications to the PBR algorithms. The modified algorithms are shown to give better quality pictures compared to PBR algorithms. The PBR algorithm and the modified PBR algorithms are highly compute intensive, Not many attempts have been made to reconstruct objects in the true 3-D sense because of the high computational overhead. In this study, we have developed parallel two-dimensional (2-D) and 3-D reconstruction algorithms based on modified PBR. We attempt to solve the two problems encountered by the PBR and modified PBR algorithms, i.e., the long computational time and the large memory requirements, by parallelizing the algorithm on a multiprocessor system. We investigate the possible task and data partitioning schemes by exploiting the potential parallelism in the PBR algorithm subject to minimizing the memory requirement. We have implemented an extended hypercube (EH) architecture for the high-speed execution of the 3-D reconstruction algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs) and dual-port random access memories (DPR) as channels between the PEs. We discuss and compare the performances of the PBR algorithm on an IBM 6000 RISC workstation, on a Silicon Graphics Indigo 2 workstation, and on an EH system. The results show that an EH(3,1) using DSP chips as PEs executes the modified PBR algorithm about 100 times faster than an LBM 6000 RISC workstation. We have executed the algorithms on a 4-node IBM SP2 parallel computer. The results show that execution time of the algorithm on an EH(3,1) is better than that of a 4-node IBM SP2 system. The speed-up of an EH(3,1) system with eight PEs and one network controller is approximately 7.85.
DOE Office of Scientific and Technical Information (OSTI.GOV)
YU, DANTONG; Jin, Shudong
2014-03-01
Data-intensive applications, including high energy and nuclear physics, astrophysics, climate modeling, nano-scale materials science, genomics, and financing, are expected to generate exabytes of data over the coming years, which must be transferred, visualized, and analyzed by geographically distributed teams of users. High-performance network capabilities must be available to these users at the application level in a transparent, virtualized manner. Moreover, the application users must have the capability to move large datasets from local and remote locations across network environments to their home institutions. To solve these challenges, the main goal of our project is to design and evaluate high-performance datamore » transfer software to support various data-intensive applications. First, we have designed a middleware software that provides access to Remote Direct Memory Access (RDMA) functionalities. This middleware integrates network access, memory management and multitasking in its core design. We address a number of issues related to its efficient implementation, for instance, explicit buffer management and memory registration, and parallelization of RDMA operations, which are vital to delivering the benefit of RDMA to the applications. Built on top of this middleware, an implementation and experimental evaluation of the RDMA-based FTP software, RFTP, is described and evaluated. This application has been implemented by our team to exploit the full capabilities of advanced RDMA mechanisms for ultra-high speed bulk data transfer applications on Energy Sciences Network (ESnet). Second, we designed our data transfer software to optimize TCP/IP based data transfer performance such that RFTP can be fully compatible with today’s Internet. Our kernel optimization techniques with Linux system calls sendfile and splice, can reduce data copy cost. In this report, we summarize the technical challenges of our project, the primary software design methods, the major project milestones achieved, as well as the testbed evaluation work and demonstrations during our project life time.« less
Scientific Visualization in High Speed Network Environments
NASA Technical Reports Server (NTRS)
Vaziri, Arsi; Kutler, Paul (Technical Monitor)
1997-01-01
In several cases, new visualization techniques have vastly increased the researcher's ability to analyze and comprehend data. Similarly, the role of networks in providing an efficient supercomputing environment have become more critical and continue to grow at a faster rate than the increase in the processing capabilities of supercomputers. A close relationship between scientific visualization and high-speed networks in providing an important link to support efficient supercomputing is identified. The two technologies are driven by the increasing complexities and volume of supercomputer data. The interaction of scientific visualization and high-speed networks in a Computational Fluid Dynamics simulation/visualization environment are given. Current capabilities supported by high speed networks, supercomputers, and high-performance graphics workstations at the Numerical Aerodynamic Simulation Facility (NAS) at NASA Ames Research Center are described. Applied research in providing a supercomputer visualization environment to support future computational requirements are summarized.
A high-speed linear algebra library with automatic parallelism
NASA Technical Reports Server (NTRS)
Boucher, Michael L.
1994-01-01
Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
NASA Astrophysics Data System (ADS)
Poya, Roman; Gil, Antonio J.; Ortigosa, Rogelio
2017-07-01
The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first or breadth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electro-mechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates combined with SIMD instructions are shown to provide a significant speed-up over the classical low-level style programming techniques.
A superconducting direct-current limiter with a power of up to 8 MVA
NASA Astrophysics Data System (ADS)
Fisher, L. M.; Alferov, D. F.; Akhmetgareev, M. R.; Budovskii, A. I.; Evsin, D. V.; Voloshin, I. F.; Kalinov, A. V.
2016-12-01
A resistive switching superconducting fault current limiter (SFCL) for DC networks with a nominal voltage of 3.5 kV and a nominal current of 2 kA was developed, produced, and tested. The SFCL has two main units—an assembly of superconducting modules and a high-speed vacuum circuit breaker. The assembly of superconducting modules consists of nine (3 × 3) parallel-series connected modules. Each module contains four parallel-connected 2G high-temperature superconducting (HTS) tapes. The results of SFCL tests in the short-circuit emulation mode with a maximum current rise rate of 1300 A/ms are presented. The SFCL is capable of limiting the current at a level of 7 kA and break it 8 ms after the current-limiting mode begins. The average temperature of HTS tapes during the current-limiting mode increases to 210 K. After the current is interrupted, the superconductivity recovery time does not exceed 1 s.
Huang, Yongyang; Badar, Mudabbir; Nitkowski, Arthur; Weinroth, Aaron; Tansu, Nelson; Zhou, Chao
2017-01-01
Space-division multiplexing optical coherence tomography (SDM-OCT) is a recently developed parallel OCT imaging method in order to achieve multi-fold speed improvement. However, the assembly of fiber optics components used in the first prototype system was labor-intensive and susceptible to errors. Here, we demonstrate a high-speed SDM-OCT system using an integrated photonic chip that can be reliably manufactured with high precisions and low per-unit cost. A three-layer cascade of 1 × 2 splitters was integrated in the photonic chip to split the incident light into 8 parallel imaging channels with ~3.7 mm optical delay in air between each channel. High-speed imaging (~1s/volume) of porcine eyes ex vivo and wide-field imaging (~18.0 × 14.3 mm2) of human fingers in vivo were demonstrated with the chip-based SDM-OCT system. PMID:28856055
Graphical processors for HEP trigger systems
NASA Astrophysics Data System (ADS)
Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cotta Ramusino, A.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.
2017-02-01
General-purpose computing on GPUs is emerging as a new paradigm in several fields of science, although so far applications have been tailored to employ GPUs as accelerators in offline computations. With the steady decrease of GPU latencies and the increase in link and memory throughputs, time is ripe for real-time applications using GPUs in high-energy physics data acquisition and trigger systems. We will discuss the use of online parallel computing on GPUs for synchronous low level trigger systems, focusing on tests performed on the trigger of the CERN NA62 experiment. Latencies of all components need analysing, networking being the most critical. To keep it under control, we envisioned NaNet, an FPGA-based PCIe Network Interface Card (NIC) enabling GPUDirect connection. Moreover, we discuss how specific trigger algorithms can be parallelised and thus benefit from a GPU implementation, in terms of increased execution speed. Such improvements are particularly relevant for the foreseen LHC luminosity upgrade where highly selective algorithms will be crucial to maintain sustainable trigger rates with very high pileup.
Improving Data Transfer Throughput with Direct Search Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balaprakash, Prasanna; Morozov, Vitali; Kettimuthu, Rajkumar
2016-01-01
Improving data transfer throughput over high-speed long-distance networks has become increasingly difficult. Numerous factors such as nondeterministic congestion, dynamics of the transfer protocol, and multiuser and multitask source and destination endpoints, as well as interactions among these factors, contribute to this difficulty. A promising approach to improving throughput consists in using parallel streams at the application layer.We formulate and solve the problem of choosing the number of such streams from a mathematical optimization perspective. We propose the use of direct search methods, a class of easy-to-implement and light-weight mathematical optimization algorithms, to improve the performance of data transfers by dynamicallymore » adapting the number of parallel streams in a manner that does not require domain expertise, instrumentation, analytical models, or historic data. We apply our method to transfers performed with the GridFTP protocol, and illustrate the effectiveness of the proposed algorithm when used within Globus, a state-of-the-art data transfer tool, on productionWAN links and servers. We show that when compared to user default settings our direct search methods can achieve up to 10x performance improvement under certain conditions. We also show that our method can overcome performance degradation due to external compute and network load on source end points, a common scenario at high performance computing facilities.« less
Zhou, Xian; Chen, Xue
2011-05-09
The digital coherent receivers combine coherent detection with digital signal processing (DSP) to compensate for transmission impairments, and therefore are a promising candidate for future high-speed optical transmission system. However, the maximum symbol rate supported by such real-time receivers is limited by the processing rate of hardware. In order to cope with this difficulty, the parallel processing algorithms is imperative. In this paper, we propose a novel parallel digital timing recovery loop (PDTRL) based on our previous work. Furthermore, for increasing the dynamic dispersion tolerance range of receivers, we embed a parallel adaptive equalizer in the PDTRL. This parallel joint scheme (PJS) can be used to complete synchronization, equalization and polarization de-multiplexing simultaneously. Finally, we demonstrate that PDTRL and PJS allow the hardware to process 112 Gbit/s POLMUX-DQPSK signal at the hundreds MHz range. © 2011 Optical Society of America
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
High-speed real-time animated displays on the ADAGE (trademark) RDS 3000 raster graphics system
NASA Technical Reports Server (NTRS)
Kahlbaum, William M., Jr.; Ownbey, Katrina L.
1989-01-01
Techniques which may be used to increase the animation update rate of real-time computer raster graphic displays are discussed. They were developed on the ADAGE RDS 3000 graphic system in support of the Advanced Concepts Simulator at the NASA Langley Research Center. These techniques involve the use of a special purpose parallel processor, for high-speed character generation. The description of the parallel processor includes the Barrel Shifter which is part of the hardware and is the key to the high-speed character rendition. The final result of this total effort was a fourfold increase in the update rate of an existing primary flight display from 4 to 16 frames per second.
Embedded parallel processing based ground control systems for small satellite telemetry
NASA Technical Reports Server (NTRS)
Forman, Michael L.; Hazra, Tushar K.; Troendly, Gregory M.; Nickum, William G.
1994-01-01
The use of networked terminals which utilize embedded processing techniques results in totally integrated, flexible, high speed, reliable, and scalable systems suitable for telemetry and data processing applications such as mission operations centers (MOC). Synergies of these terminals, coupled with the capability of terminal to receive incoming data, allow the viewing of any defined display by any terminal from the start of data acquisition. There is no single point of failure (other than with network input) such as exists with configurations where all input data goes through a single front end processor and then to a serial string of workstations. Missions dedicated to NASA's ozone measurements program utilize the methodologies which are discussed, and result in a multimission configuration of low cost, scalable hardware and software which can be run by one flight operations team with low risk.
Thread concept for automatic task parallelization in image analysis
NASA Astrophysics Data System (ADS)
Lueckenhaus, Maximilian; Eckstein, Wolfgang
1998-09-01
Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
The Computing and Communications (C) Division is responsible for the Laboratory's Integrated Computing Network (ICN) as well as Laboratory-wide communications. Our computing network, used by 8,000 people distributed throughout the nation, constitutes one of the most powerful scientific computing facilities in the world. In addition to the stable production environment of the ICN, we have taken a leadership role in high-performance computing and have established the Advanced Computing Laboratory (ACL), the site of research on experimental, massively parallel computers; high-speed communication networks; distributed computing; and a broad variety of advanced applications. The computational resources available in the ACL are ofmore » the type needed to solve problems critical to national needs, the so-called Grand Challenge'' problems. The purpose of this publication is to inform our clients of our strategic and operating plans in these important areas. We review major accomplishments since late 1990 and describe our strategic planning goals and specific projects that will guide our operations over the next few years. Our mission statement, planning considerations, and management policies and practices are also included.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
The Computing and Communications (C) Division is responsible for the Laboratory`s Integrated Computing Network (ICN) as well as Laboratory-wide communications. Our computing network, used by 8,000 people distributed throughout the nation, constitutes one of the most powerful scientific computing facilities in the world. In addition to the stable production environment of the ICN, we have taken a leadership role in high-performance computing and have established the Advanced Computing Laboratory (ACL), the site of research on experimental, massively parallel computers; high-speed communication networks; distributed computing; and a broad variety of advanced applications. The computational resources available in the ACL are ofmore » the type needed to solve problems critical to national needs, the so-called ``Grand Challenge`` problems. The purpose of this publication is to inform our clients of our strategic and operating plans in these important areas. We review major accomplishments since late 1990 and describe our strategic planning goals and specific projects that will guide our operations over the next few years. Our mission statement, planning considerations, and management policies and practices are also included.« less
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems
NASA Technical Reports Server (NTRS)
Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.
2000-01-01
The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Link and Network Layers Design for Ultra-High-Speed Terahertz-Band Communications Networks
2017-01-01
throughput, and identify the optimal parameter values for their design (Sec. 6.2.3). Moreover, we validate and test the scheme with experimental data obtained...LINK AND NETWORK LAYERS DESIGN FOR ULTRA-HIGH- SPEED TERAHERTZ-BAND COMMUNICATIONS NETWORKS STATE UNIVERSITY OF NEW YORK (SUNY) AT BUFFALO JANUARY...TYPE FINAL TECHNICAL REPORT 3. DATES COVERED (From - To) FEB 2015 – SEP 2016 4. TITLE AND SUBTITLE LINK AND NETWORK LAYERS DESIGN FOR ULTRA-HIGH
Brian Hears: Online Auditory Processing Using Vectorization Over Channels
Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453
Embedded controller for GEM detector readout system
NASA Astrophysics Data System (ADS)
Zabołotny, Wojciech M.; Byszuk, Adrian; Chernyshova, Maryna; Cieszewski, Radosław; Czarski, Tomasz; Dominik, Wojciech; Jakubowska, Katarzyna L.; Kasprowicz, Grzegorz; Poźniak, Krzysztof; Rzadkiewicz, Jacek; Scholz, Marek
2013-10-01
This paper describes the embedded controller used for the multichannel readout system for the GEM detector. The controller is based on the embedded Mini ITX mainboard, running the GNU/Linux operating system. The controller offers two interfaces to communicate with the FPGA based readout system. FPGA configuration and diagnostics is controlled via low speed USB based interface, while high-speed setup of the readout parameters and reception of the measured data is handled by the PCI Express (PCIe) interface. Hardware access is synchronized by the dedicated server written in C. Multiple clients may connect to this server via TCP/IP network, and different priority is assigned to individual clients. Specialized protocols have been implemented both for low level access on register level and for high level access with transfer of structured data with "msgpack" protocol. High level functionalities have been split between multiple TCP/IP servers for parallel operation. Status of the system may be checked, and basic maintenance may be performed via web interface, while the expert access is possible via SSH server. System was designed with reliability and flexibility in mind.
NASA Astrophysics Data System (ADS)
Harmon, Frederick G.
2005-11-01
Parallel hybrid-electric propulsion systems would be beneficial for small unmanned aerial vehicles (UAVs) used for military, homeland security, and disaster-monitoring missions. The benefits, due to the hybrid and electric-only modes, include increased time-on-station and greater range as compared to electric-powered UAVs and stealth modes not available with gasoline-powered UAVs. This dissertation contributes to the research fields of small unmanned aerial vehicles, hybrid-electric propulsion system control, and intelligent control. A conceptual design of a small UAV with a parallel hybrid-electric propulsion system is provided. The UAV is intended for intelligence, surveillance, and reconnaissance (ISR) missions. A conceptual design reveals the trade-offs that must be considered to take advantage of the hybrid-electric propulsion system. The resulting hybrid-electric propulsion system is a two-point design that includes an engine primarily sized for cruise speed and an electric motor and battery pack that are primarily sized for a slower endurance speed. The electric motor provides additional power for take-off, climbing, and acceleration and also serves as a generator during charge-sustaining operation or regeneration. The intelligent control of the hybrid-electric propulsion system is based on an instantaneous optimization algorithm that generates a hyper-plane from the nonlinear efficiency maps for the internal combustion engine, electric motor, and lithium-ion battery pack. The hyper-plane incorporates charge-depletion and charge-sustaining strategies. The optimization algorithm is flexible and allows the operator/user to assign relative importance between the use of gasoline, electricity, and recharging depending on the intended mission. A MATLAB/Simulink model was developed to test the control algorithms. The Cerebellar Model Arithmetic Computer (CMAC) associative memory neural network is applied to the control of the UAVs parallel hybrid-electric propulsion system. The CMAC neural network approximates the hyper-plane generated from the instantaneous optimization algorithm and produces torque commands for the internal combustion engine and electric motor. The CMAC neural network controller saves on the required memory as compared to a large look-up table by two orders of magnitude. The CMAC controller also prevents the need to compute a hyper-plane or complex logic every time step.
NASA Astrophysics Data System (ADS)
Koyama, Tomonori; Kaiho, Katsuyuki; Yamaguchi, Iwao; Yanabu, Satoru
Using a high-temperature superconductor, we constructed and tested a model superconducting fault current limiter (SFCL). The superconductor and vacuum interrupter as the commutation switch were connected in parallel using a bypass coil. When the fault current flows in this equipment, the superconductor is quenched and the current is then transferred to the parallel coil due to the voltage drop in the superconductor. This large current in the parallel coil actuates the magnetic repulsion mechanism of the vacuum interrupter and the current in the superconductor is broken. Using this equipment, the current flow time in the superconductor can be easily minimized. On the other hand, the fault current is also easily limited by large reactance of the parallel coil. This system has many merits. So, we introduced to electromagnetic repulsion switch. There is duty of high speed re-closing after interrupting fault current in the electrical power system. So the SFCL should be recovered to superconducting state before high speed re-closing. But, superconductor generated heat at the time of quench. It takes time to recover superconducting state. Therefore it is a matter of recovery time. In this paper, we studied recovery time of superconductor. Also, we proposed electromagnetic repulsion switch with reclosing system.
Recent advancements towards green optical networks
NASA Astrophysics Data System (ADS)
Davidson, Alan; Glesk, Ivan; Buis, Adrianus; Wang, Junjia; Chen, Lawrence
2014-12-01
Recent years have seen a rapid growth in demand for ultra high speed data transmission with end users expecting fast, high bandwidth network access. With this rapid growth in demand, data centres are under pressure to provide ever increasing data rates through their networks and at the same time improve the quality of data handling in terms of reduced latency, increased scalability and improved channel speed for users. However as data rates increase, present technology based on well-established CMOS technology is becoming increasingly difficult to scale and consequently data networks are struggling to satisfy current network demand. In this paper the interrelated issues of electronic scalability, power consumption, limited copper interconnect bandwidth and the limited speed of CMOS electronics will be explored alongside the tremendous bandwidth potential of optical fibre based photonic networks. Some applications of photonics to help alleviate the speed and latency in data networks will be discussed.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers
NASA Technical Reports Server (NTRS)
Guruswamy, Guru; VanDalsem, William (Technical Monitor)
1994-01-01
Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
Efficient spiking neural network model of pattern motion selectivity in visual cortex.
Beyeler, Michael; Richert, Micah; Dutt, Nikil D; Krichmar, Jeffrey L
2014-07-01
Simulating large-scale models of biological motion perception is challenging, due to the required memory to store the network structure and the computational power needed to quickly solve the neuronal dynamics. A low-cost yet high-performance approach to simulating large-scale neural network models in real-time is to leverage the parallel processing capability of graphics processing units (GPUs). Based on this approach, we present a two-stage model of visual area MT that we believe to be the first large-scale spiking network to demonstrate pattern direction selectivity. In this model, component-direction-selective (CDS) cells in MT linearly combine inputs from V1 cells that have spatiotemporal receptive fields according to the motion energy model of Simoncelli and Heeger. Pattern-direction-selective (PDS) cells in MT are constructed by pooling over MT CDS cells with a wide range of preferred directions. Responses of our model neurons are comparable to electrophysiological results for grating and plaid stimuli as well as speed tuning. The behavioral response of the network in a motion discrimination task is in agreement with psychophysical data. Moreover, our implementation outperforms a previous implementation of the motion energy model by orders of magnitude in terms of computational speed and memory usage. The full network, which comprises 153,216 neurons and approximately 40 million synapses, processes 20 frames per second of a 40 × 40 input video in real-time using a single off-the-shelf GPU. To promote the use of this algorithm among neuroscientists and computer vision researchers, the source code for the simulator, the network, and analysis scripts are publicly available.
Reliability Analysis and Modeling of ZigBee Networks
NASA Astrophysics Data System (ADS)
Lin, Cheng-Min
The architecture of ZigBee networks focuses on developing low-cost, low-speed ubiquitous communication between devices. The ZigBee technique is based on IEEE 802.15.4, which specifies the physical layer and medium access control (MAC) for a low rate wireless personal area network (LR-WPAN). Currently, numerous wireless sensor networks have adapted the ZigBee open standard to develop various services to promote improved communication quality in our daily lives. The problem of system and network reliability in providing stable services has become more important because these services will be stopped if the system and network reliability is unstable. The ZigBee standard has three kinds of networks; star, tree and mesh. The paper models the ZigBee protocol stack from the physical layer to the application layer and analyzes these layer reliability and mean time to failure (MTTF). Channel resource usage, device role, network topology and application objects are used to evaluate reliability in the physical, medium access control, network, and application layers, respectively. In the star or tree networks, a series system and the reliability block diagram (RBD) technique can be used to solve their reliability problem. However, a division technology is applied here to overcome the problem because the network complexity is higher than that of the others. A mesh network using division technology is classified into several non-reducible series systems and edge parallel systems. Hence, the reliability of mesh networks is easily solved using series-parallel systems through our proposed scheme. The numerical results demonstrate that the reliability will increase for mesh networks when the number of edges in parallel systems increases while the reliability quickly drops when the number of edges and the number of nodes increase for all three networks. More use of resources is another factor impact on reliability decreasing. However, lower network reliability will occur due to network complexity, more resource usage and complex object relationship.
Genetic Parallel Programming: design and implementation.
Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong
2006-01-01
This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.
A high-speed network for cardiac image review.
Elion, J L; Petrocelli, R R
1994-01-01
A high-speed fiber-based network for the transmission and display of digitized full-motion cardiac images has been developed. Based on Asynchronous Transfer Mode (ATM), the network is scaleable, meaning that the same software and hardware is used for a small local area network or for a large multi-institutional network. The system can handle uncompressed digital angiographic images, considered to be at the "high-end" of the bandwidth requirements. Along with the networking, a general-purpose multi-modality review station has been implemented without specialized hardware. This station can store a full injection sequence in "loop RAM" in a 512 x 512 format, then interpolate to 1024 x 1024 while displaying at 30 frames per second. The network and review stations connect to a central file server that uses a virtual file system to make a large high-speed RAID storage disk and associated off-line storage tapes and cartridges all appear as a single large file system to the software. In addition to supporting archival storage and review, the system can also digitize live video using high-speed Direct Memory Access (DMA) from the frame grabber to present uncompressed data to the network. Fully functional prototypes have provided the proof of concept, with full deployment in the institution planned as the next stage.
A high-speed network for cardiac image review.
Elion, J. L.; Petrocelli, R. R.
1994-01-01
A high-speed fiber-based network for the transmission and display of digitized full-motion cardiac images has been developed. Based on Asynchronous Transfer Mode (ATM), the network is scaleable, meaning that the same software and hardware is used for a small local area network or for a large multi-institutional network. The system can handle uncompressed digital angiographic images, considered to be at the "high-end" of the bandwidth requirements. Along with the networking, a general-purpose multi-modality review station has been implemented without specialized hardware. This station can store a full injection sequence in "loop RAM" in a 512 x 512 format, then interpolate to 1024 x 1024 while displaying at 30 frames per second. The network and review stations connect to a central file server that uses a virtual file system to make a large high-speed RAID storage disk and associated off-line storage tapes and cartridges all appear as a single large file system to the software. In addition to supporting archival storage and review, the system can also digitize live video using high-speed Direct Memory Access (DMA) from the frame grabber to present uncompressed data to the network. Fully functional prototypes have provided the proof of concept, with full deployment in the institution planned as the next stage. PMID:7949964
Multi-petascale highly efficient parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time andmore » supports DMA functionality allowing for parallel processing message-passing.« less
Shevtsova, Natalia A; Talpalar, Adolfo E; Markin, Sergey N; Harris-Warrick, Ronald M; Kiehn, Ole; Rybak, Ilya A
2015-01-01
Different locomotor gaits in mammals, such as walking or galloping, are produced by coordinated activity in neuronal circuits in the spinal cord. Coordination of neuronal activity between left and right sides of the cord is provided by commissural interneurons (CINs), whose axons cross the midline. In this study, we construct and analyse two computational models of spinal locomotor circuits consisting of left and right rhythm generators interacting bilaterally via several neuronal pathways mediated by different CINs. The CIN populations incorporated in the models include the genetically identified inhibitory (V0D) and excitatory (V0V) subtypes of V0 CINs and excitatory V3 CINs. The model also includes the ipsilaterally projecting excitatory V2a interneurons mediating excitatory drive to the V0V CINs. The proposed network architectures and CIN connectivity allow the models to closely reproduce and suggest mechanistic explanations for several experimental observations. These phenomena include: different speed-dependent contributions of V0D and V0V CINs and V2a interneurons to left–right alternation of neural activity, switching gaits between the left–right alternating walking-like activity and the left–right synchronous hopping-like pattern in mutants lacking specific neuron classes, and speed-dependent asymmetric changes of flexor and extensor phase durations. The models provide insights into the architecture of spinal network and the organization of parallel inhibitory and excitatory CIN pathways and suggest explanations for how these pathways maintain alternating and synchronous gaits at different locomotor speeds. The models propose testable predictions about the neural organization and operation of mammalian locomotor circuits. Key points Coordination of neuronal activity between left and right sides of the mammalian spinal cord is provided by several sets of commissural interneurons (CINs) whose axons cross the midline. Genetically identified inhibitory V0D and excitatory V0V CINs and ipsilaterally projecting excitatory V2a interneurons were shown to secure left–right alternation at different locomotor speeds. We have developed computational models of neuronal circuits in the spinal cord that include left and right rhythm-generating centres interacting bilaterally via three parallel pathways mediated by V0D, V2a–V0V and V3 neuron populations. The models reproduce the experimentally observed speed-dependent left–right coordination in normal mice and the changes in coordination seen in mutants lacking specific neuron classes. The models propose an explanation for several experimental results and provide insights into the organization of the spinal locomotor network and parallel CIN pathways involved in gait control at different locomotor speeds. PMID:25820677
Cost-effective data storage/archival subsystem for functional PACS
NASA Astrophysics Data System (ADS)
Chen, Y. P.; Kim, Yongmin
1993-09-01
Not the least of the requirements of a workable PACS is the ability to store and archive vast amounts of information. A medium-size hospital will generate between 1 and 2 TBytes of data annually on a fully functional PACS. A high-speed image transmission network coupled with a comparably high-speed central data storage unit can make local memory and magnetic disks in the PACS workstations less critical and, in an extreme case, unnecessary. Under these circumstances, the capacity and performance of the central data storage subsystem and database is critical in determining the response time at the workstations, thus significantly affecting clinical acceptability. The central data storage subsystem not only needs to provide sufficient capacity to store about ten days worth of images (five days worth of new studies, and on the average, about one comparison study for each new study), but also supplies images to the requesting workstation in a timely fashion. The database must provide fast retrieval responses upon users' requests for images. This paper analyzes both advantages and disadvantages of multiple parallel transfer disks versus RAID disks for short-term central data storage subsystem, as well as optical disk jukebox versus digital recorder tape subsystem for long-term archive. Furthermore, an example high-performance cost-effective storage subsystem which integrates both the RAID disks and high-speed digital tape subsystem as a cost-effective PACS data storage/archival unit are presented.
Fast Face-Recognition Optical Parallel Correlator Using High Accuracy Correlation Filter
NASA Astrophysics Data System (ADS)
Watanabe, Eriko; Kodate, Kashiko
2005-11-01
We designed and fabricated a fully automatic fast face recognition optical parallel correlator [E. Watanabe and K. Kodate: Appl. Opt. 44 (2005) 5666] based on the VanderLugt principle. The implementation of an as-yet unattained ultra high-speed system was aided by reconfiguring the system to make it suitable for easier parallel processing, as well as by composing a higher accuracy correlation filter and high-speed ferroelectric liquid crystal-spatial light modulator (FLC-SLM). In running trial experiments using this system (dubbed FARCO), we succeeded in acquiring remarkably low error rates of 1.3% for false match rate (FMR) and 2.6% for false non-match rate (FNMR). Given the results of our experiments, the aim of this paper is to examine methods of designing correlation filters and arranging database image arrays for even faster parallel correlation, underlining the issues of calculation technique, quantization bit rate, pixel size and shift from optical axis. The correlation filter has proved its excellent performance and higher precision than classical correlation and joint transform correlator (JTC). Moreover, arrangement of multi-object reference images leads to 10-channel correlation signals, as sharply marked as those of a single channel. This experiment result demonstrates great potential for achieving the process speed of 10000 face/s.
A programmable computational image sensor for high-speed vision
NASA Astrophysics Data System (ADS)
Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian
2013-08-01
In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
NASA Astrophysics Data System (ADS)
Zaripov, D. I.; Renfu, Li
2018-05-01
The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.
The ASCI Network for SC 2000: Gigabyte Per Second Networking
DOE Office of Scientific and Technical Information (OSTI.GOV)
PRATT, THOMAS J.; NAEGLE, JOHN H.; MARTINEZ JR., LUIS G.
2001-11-01
This document highlights the Discom's Distance computing and communication team activities at the 2000 Supercomputing conference in Dallas Texas. This conference is sponsored by the IEEE and ACM. Sandia's participation in the conference has now spanned a decade, for the last five years Sandia National Laboratories, Los Alamos National Lab and Lawrence Livermore National Lab have come together at the conference under the DOE's ASCI, Accelerated Strategic Computing Initiatives, Program rubric to demonstrate ASCI's emerging capabilities in computational science and our combined expertise in high performance computer science and communication networking developments within the program. At SC 2000, DISCOM demonstratedmore » an infrastructure. DISCOM2 uses this forum to demonstrate and focus communication and pre-standard implementation of 10 Gigabit Ethernet, the first gigabyte per second data IP network transfer application, and VPN technology that enabled a remote Distributed Resource Management tools demonstration. Additionally a national OC48 POS network was constructed to support applications running between the show floor and home facilities. This network created the opportunity to test PSE's Parallel File Transfer Protocol (PFTP) across a network that had similar speed and distances as the then proposed DISCOM WAN. The SCINET SC2000 showcased wireless networking and the networking team had the opportunity to explore this emerging technology while on the booth. This paper documents those accomplishments, discusses the details of their convention exhibit floor. We also supported the production networking needs of the implementation, and describes how these demonstrations supports DISCOM overall strategies in high performance computing networking.« less
ESA's satellite communications programme
NASA Astrophysics Data System (ADS)
Bartholome, P.
1985-02-01
The developmental history, current status, and future plans of the ESA satellite-communications programs are discussed in a general survey and illustrated with network diagrams and maps. Consideration is given to the parallel development of national and European direct-broadcast systems and telecommunications networks, the position of the European space and electronics industries in the growing world market, the impact of technological improvements (both in satellite systems and in ground-based networks), and the technological and commercial advantages of integrated space-terrestrial networks. The needs for a European definition of the precise national and international roles of satellite communications, for maximum speed in implementing such decisions (before the technology becomes obsolete), and for increased cooperation and standardization to assure European equipment manufacturers a reasonable share of the market are stressed.
NASA Astrophysics Data System (ADS)
Timchenko, Leonid; Yarovyi, Andrii; Kokriatskaya, Nataliya; Nakonechna, Svitlana; Abramenko, Ludmila; Ławicki, Tomasz; Popiel, Piotr; Yesmakhanova, Laura
2016-09-01
The paper presents a method of parallel-hierarchical transformations for rapid recognition of dynamic images using GPU technology. Direct parallel-hierarchical transformations based on cluster CPU-and GPU-oriented hardware platform. Mathematic models of training of the parallel hierarchical (PH) network for the transformation are developed, as well as a training method of the PH network for recognition of dynamic images. This research is most topical for problems on organizing high-performance computations of super large arrays of information designed to implement multi-stage sensing and processing as well as compaction and recognition of data in the informational structures and computer devices. This method has such advantages as high performance through the use of recent advances in parallelization, possibility to work with images of ultra dimension, ease of scaling in case of changing the number of nodes in the cluster, auto scan of local network to detect compute nodes.
MrGrid: A Portable Grid Based Molecular Replacement Pipeline
Reboul, Cyril F.; Androulakis, Steve G.; Phan, Jennifer M. N.; Whisstock, James C.; Goscinski, Wojtek J.; Abramson, David; Buckle, Ashley M.
2010-01-01
Background The crystallographic determination of protein structures can be computationally demanding and for difficult cases can benefit from user-friendly interfaces to high-performance computing resources. Molecular replacement (MR) is a popular protein crystallographic technique that exploits the structural similarity between proteins that share some sequence similarity. But the need to trial permutations of search models, space group symmetries and other parameters makes MR time- and labour-intensive. However, MR calculations are embarrassingly parallel and thus ideally suited to distributed computing. In order to address this problem we have developed MrGrid, web-based software that allows multiple MR calculations to be executed across a grid of networked computers, allowing high-throughput MR. Methodology/Principal Findings MrGrid is a portable web based application written in Java/JSP and Ruby, and taking advantage of Apple Xgrid technology. Designed to interface with a user defined Xgrid resource the package manages the distribution of multiple MR runs to the available nodes on the Xgrid. We evaluated MrGrid using 10 different protein test cases on a network of 13 computers, and achieved an average speed up factor of 5.69. Conclusions MrGrid enables the user to retrieve and manage the results of tens to hundreds of MR calculations quickly and via a single web interface, as well as broadening the range of strategies that can be attempted. This high-throughput approach allows parameter sweeps to be performed in parallel, improving the chances of MR success. PMID:20386612
Sensor-scheduling simulation of disparate sensors for Space Situational Awareness
NASA Astrophysics Data System (ADS)
Hobson, T.; Clarkson, I.
2011-09-01
The art and science of space situational awareness (SSA) has been practised and developed from the time of Sputnik. However, recent developments, such as the accelerating pace of satellite launch, the proliferation of launch capable agencies, both commercial and sovereign, and recent well-publicised collisions involving man-made space objects, has further magnified the importance of timely and accurate SSA. The United States Strategic Command (USSTRATCOM) operates the Space Surveillance Network (SSN), a global network of sensors tasked with maintaining SSA. The rapidly increasing number of resident space objects will require commensurate improvements in the SSN. Sensors are scarce resources that must be scheduled judiciously to obtain measurements of maximum utility. Improvements in sensor scheduling and fusion, can serve to reduce the number of additional sensors that may be required. Recently, Hill et al. [1] have proposed and developed a simulation environment named TASMAN (Tasking Autonomous Sensors in a Multiple Application Network) to enable testing of alternative scheduling strategies within a simulated multi-sensor, multi-target environment. TASMAN simulates a high-fidelity, hardware-in-the-loop system by running multiple machines with different roles in parallel. At present, TASMAN is limited to simulations involving electro-optic sensors. Its high fidelity is at once a feature and a limitation, since supercomputing is required to run simulations of appreciable scale. In this paper, we describe an alternative, modular and scalable SSA simulation system that can extend the work of Hill et al with reduced complexity, albeit also with reduced fidelity. The tool has been developed in MATLAB and therefore can be run on a very wide range of computing platforms. It can also make use of MATLAB’s parallel processing capabilities to obtain considerable speed-up. The speed and flexibility so obtained can be used to quickly test scheduling algorithms even with a relatively large number of space objects. We further describe an application of the tool by exploring how the relative mixture of electro-optical and radar sensors can impact the scheduling, fusion and achievable accuracy of an SSA system. By varying the mixture of sensor types, we are able to characterise the main advantages and disadvantages of each configuration.
Cedar Project---Original goals and progress to date
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cybenko, G.; Kuck, D.; Padua, D.
1990-11-28
This work encompasses a broad attack on high speed parallel processing. Hardware, software, applications development, and performance evaluation and visualization as well as research topics are proposed. Our goal is to develop practical parallel processing for the 1990's.
Flexible All-Digital Receiver for Bandwidth Efficient Modulations
NASA Technical Reports Server (NTRS)
Gray, Andrew; Srinivasan, Meera; Simon, Marvin; Yan, Tsun-Yee
2000-01-01
An all-digital high data rate parallel receiver architecture developed jointly by Goddard Space Flight Center and the Jet Propulsion Laboratory is presented. This receiver utilizes only a small number of high speed components along with a majority of lower speed components operating in a parallel frequency domain structure implementable in CMOS, and can currently process up to 600 Mbps with standard QPSK modulation. Performance results for this receiver for bandwidth efficient QPSK modulation schemes such as square-root raised cosine pulse shaped QPSK and Feher's patented QPSK are presented, demonstrating the flexibility of the receiver architecture.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. House Committee on Science, Space and Technology.
This document contains the transcript of three hearings on the High Speed Performance Computing and High Speed Networking Applications Act of 1993 (H.R. 1757). The hearings were designed to obtain specific suggestions for improvements to the legislation and alternative or additional application areas that should be pursued. Testimony and prepared…
Chen, Liang; Xue, Wei; Tokuda, Naoyuki
2010-08-01
In many pattern classification/recognition applications of artificial neural networks, an object to be classified is represented by a fixed sized 2-dimensional array of uniform type, which corresponds to the cells of a 2-dimensional grid of the same size. A general neural network structure, called an undistricted neural network, which takes all the elements in the array as inputs could be used for problems such as these. However, a districted neural network can be used to reduce the training complexity. A districted neural network usually consists of two levels of sub-neural networks. Each of the lower level neural networks, called a regional sub-neural network, takes the elements in a region of the array as its inputs and is expected to output a temporary class label, called an individual opinion, based on the partial information of the entire array. The higher level neural network, called an assembling sub-neural network, uses the outputs (opinions) of regional sub-neural networks as inputs, and by consensus derives the label decision for the object. Each of the sub-neural networks can be trained separately and thus the training is less expensive. The regional sub-neural networks can be trained and performed in parallel and independently, therefore a high speed can be achieved. We prove theoretically in this paper, using a simple model, that a districted neural network is actually more stable than an undistricted neural network in noisy environments. We conjecture that the result is valid for all neural networks. This theory is verified by experiments involving gender classification and human face recognition. We conclude that a districted neural network is highly recommended for neural network applications in recognition or classification of 2-dimensional array patterns in highly noisy environments. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.
Ly, Daniel Le; Chow, Paul
2010-11-01
Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.
Self-calibrated correlation imaging with k-space variant correlation functions.
Li, Yu; Edalati, Masoud; Du, Xingfu; Wang, Hui; Cao, Jie J
2018-03-01
Correlation imaging is a previously developed high-speed MRI framework that converts parallel imaging reconstruction into the estimate of correlation functions. The presented work aims to demonstrate this framework can provide a speed gain over parallel imaging by estimating k-space variant correlation functions. Because of Fourier encoding with gradients, outer k-space data contain higher spatial-frequency image components arising primarily from tissue boundaries. As a result of tissue-boundary sparsity in the human anatomy, neighboring k-space data correlation varies from the central to the outer k-space. By estimating k-space variant correlation functions with an iterative self-calibration method, correlation imaging can benefit from neighboring k-space data correlation associated with both coil sensitivity encoding and tissue-boundary sparsity, thereby providing a speed gain over parallel imaging that relies only on coil sensitivity encoding. This new approach is investigated in brain imaging and free-breathing neonatal cardiac imaging. Correlation imaging performs better than existing parallel imaging techniques in simulated brain imaging acceleration experiments. The higher speed enables real-time data acquisition for neonatal cardiac imaging in which physiological motion is fast and non-periodic. With k-space variant correlation functions, correlation imaging gives a higher speed than parallel imaging and offers the potential to image physiological motion in real-time. Magn Reson Med 79:1483-1494, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
A Control Simulation Method of High-Speed Trains on Railway Network with Irregular Influence
NASA Astrophysics Data System (ADS)
Yang, Li-Xing; Li, Xiang; Li, Ke-Ping
2011-09-01
Based on the discrete time method, an effective movement control model is designed for a group of highspeed trains on a rail network. The purpose of the model is to investigate the specific traffic characteristics of high-speed trains under the interruption of stochastic irregular events. In the model, the high-speed rail traffic system is supposed to be equipped with the moving-block signalling system to guarantee maximum traversing capacity of the railway. To keep the safety of trains' movements, some operational strategies are proposed to control the movements of trains in the model, including traction operation, braking operation, and entering-station operation. The numerical simulations show that the designed model can well describe the movements of high-speed trains on the rail network. The research results can provide the useful information not only for investigating the propagation features of relevant delays under the irregular disturbance but also for rerouting and rescheduling trains on the rail network.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids
NASA Technical Reports Server (NTRS)
Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
Hierarchy Bayesian model based services awareness of high-speed optical access networks
NASA Astrophysics Data System (ADS)
Bai, Hui-feng
2018-03-01
As the speed of optical access networks soars with ever increasing multiple services, the service-supporting ability of optical access networks suffers greatly from the shortage of service awareness. Aiming to solve this problem, a hierarchy Bayesian model based services awareness mechanism is proposed for high-speed optical access networks. This approach builds a so-called hierarchy Bayesian model, according to the structure of typical optical access networks. Moreover, the proposed scheme is able to conduct simple services awareness operation in each optical network unit (ONU) and to perform complex services awareness from the whole view of system in optical line terminal (OLT). Simulation results show that the proposed scheme is able to achieve better quality of services (QoS), in terms of packet loss rate and time delay.
Double lead spiral platen parallel jaw end effector
NASA Technical Reports Server (NTRS)
Beals, David C.
1989-01-01
The double lead spiral platen parallel jaw end effector is an extremely powerful, compact, and highly controllable end effector that represents a significant improvement in gripping force and efficiency over the LaRC Puma (LP) end effector. The spiral end effector is very simple in its design and has relatively few parts. The jaw openings are highly predictable and linear, making it an ideal candidate for remote control. The finger speed is within acceptable working limits and can be modified to meet the user needs; for instance, greater finger speed could be obtained by increasing the pitch of the spiral. The force relaxation is comparable to the other tested units. Optimization of the end effector design would involve a compromise of force and speed for a given application.
Optimization research of railway passenger transfer scheme based on ant colony algorithm
NASA Astrophysics Data System (ADS)
Ni, Xiang
2018-05-01
The optimization research of railway passenger transfer scheme can provide strong support for railway passenger transport system, and its essence is path search. This paper realized the calculation of passenger transfer scheme for high speed railway when giving the time and stations of departure and arrival. The specific method that used were generating a passenger transfer service network of high-speed railway, establishing optimization model and searching by Ant Colony Algorithm. Finally, making analysis on the scheme from LanZhouxi to BeiJingXi which were based on high-speed railway network of China in 2017. The results showed that the transfer network and model had relatively high practical value and operation efficiency.
Circuity analyses of HSR network and high-speed train paths in China
Zhao, Shuo; Huang, Jie; Shan, Xinghua
2017-01-01
Circuity, defined as the ratio of the shortest network distance to the Euclidean distance between one origin–destination (O-D) pair, can be adopted as a helpful evaluation method of indirect degrees of train paths. In this paper, the maximum circuity of the paths of operated trains is set to be the threshold value of the circuity of high-speed train paths. For the shortest paths of any node pairs, if their circuity is not higher than the threshold value, the paths can be regarded as the reasonable paths. With the consideration of a certain relative or absolute error, we cluster the reasonable paths on the basis of their inclusion relationship and the center path of each class represents a passenger transit corridor. We take the high-speed rail (HSR) network in China at the end of 2014 as an example, and obtain 51 passenger transit corridors, which are alternative sets of train paths. Furthermore, we analyze the circuity distribution of paths of all node pairs in the network. We find that the high circuity of train paths can be decreased with the construction of a high-speed railway line, which indicates that the structure of the HSR network in China tends to be more complete and the HSR network can make the Chinese railway network more efficient. PMID:28945757
Hadoop neural network for parallel and distributed feature selection.
Hodge, Victoria J; O'Keefe, Simon; Austin, Jim
2016-06-01
In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Chin; Corttrell, R. A.
This Technical Note provides an overview of high-performance parallel Big Data transfers with and without encryption for data in-transit over multiple network channels. It shows that with the parallel approach, it is feasible to carry out high-performance parallel "encrypted" Big Data transfers without serious impact to throughput. But other impacts, e.g. the energy-consumption part should be investigated. It also explains our rationales of using a statistics-based approach for gaining understanding from test results and for improving the system. The presentation is of high-level nature. Nevertheless, at the end we will pose some questions and identify potentially fruitful directions for futuremore » work.« less
Enabling parallel simulation of large-scale HPC network systems
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; ...
2016-04-07
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
Enabling parallel simulation of large-scale HPC network systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.
Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks usedmore » in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations« less
ERIC Educational Resources Information Center
General Accounting Office, Washington, DC. Information Management and Technology Div.
This report was prepared in response to a request for information on supercomputers and high-speed networks from the Senate Committee on Commerce, Science, and Transportation, and the House Committee on Science, Space, and Technology. The following information was requested: (1) examples of how various industries are using supercomputers to…
Digital optical computers at the optoelectronic computing systems center
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1991-01-01
The Digital Optical Computing Program within the National Science Foundation Engineering Research Center for Opto-electronic Computing Systems has as its specific goal research on optical computing architectures suitable for use at the highest possible speeds. The program can be targeted toward exploiting the time domain because other programs in the Center are pursuing research on parallel optical systems, exploiting optical interconnection and optical devices and materials. Using a general purpose computing architecture as the focus, we are developing design techniques, tools and architecture for operation at the speed of light limit. Experimental work is being done with the somewhat low speed components currently available but with architectures which will scale up in speed as faster devices are developed. The design algorithms and tools developed for a general purpose, stored program computer are being applied to other systems such as optimally controlled optical communication networks.
Highball: A high speed, reserved-access, wide area network
NASA Technical Reports Server (NTRS)
Mills, David L.; Boncelet, Charles G.; Elias, John G.; Schragger, Paul A.; Jackson, Alden W.
1990-01-01
A network architecture called Highball and a preliminary design for a prototype, wide-area data network designed to operate at speeds of 1 Gbps and beyond are described. It is intended for applications requiring high speed burst transmissions where some latency between requesting a transmission and granting the request can be anticipated and tolerated. Examples include real-time video and disk-disk transfers, national filestore access, remote sensing, and similar applications. The network nodes include an intelligent crossbar switch, but have no buffering capabilities; thus, data must be queued at the end nodes. There are no restrictions on the network topology, link speeds, or end-end protocols. The end system, nodes, and links can operate at any speed up to the limits imposed by the physical facilities. An overview of an initial design approach is presented and is intended as a benchmark upon which a detailed design can be developed. It describes the network architecture and proposed access protocols, as well as functional descriptions of the hardware and software components that could be used in a prototype implementation. It concludes with a discussion of additional issues to be resolved in continuing stages of this project.
Integration experiences and performance studies of A COTS parallel archive systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Hsing-bung; Scott, Cody; Grider, Bary
2010-01-01
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and lessmore » robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of future archival storage systems.« less
Integration experiments and performance studies of a COTS parallel archive system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Hsing-bung; Scott, Cody; Grider, Gary
2010-06-16
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching andmore » less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address requirements of future archival storage systems.« less
A Study of Quality of Service Communication for High-Speed Packet-Switching Computer Sub-Networks
NASA Technical Reports Server (NTRS)
Cui, Zhenqian
1999-01-01
In this thesis, we analyze various factors that affect quality of service (QoS) communication in high-speed, packet-switching sub-networks. We hypothesize that sub-network-wide bandwidth reservation and guaranteed CPU processing power at endpoint systems for handling data traffic are indispensable to achieving hard end-to-end quality of service. Different bandwidth reservation strategies, traffic characterization schemes, and scheduling algorithms affect the network resources and CPU usage as well as the extent that QoS can be achieved. In order to analyze those factors, we design and implement a communication layer. Our experimental analysis supports our research hypothesis. The Resource ReSerVation Protocol (RSVP) is designed to realize resource reservation. Our analysis of RSVP shows that using RSVP solely is insufficient to provide hard end-to-end quality of service in a high-speed sub-network. Analysis of the IEEE 802.lp protocol also supports the research hypothesis.
Fiber to the home: next generation network
NASA Astrophysics Data System (ADS)
Yang, Chengxin; Guo, Baoping
2006-07-01
Next generation networks capable of carrying converged telephone, television (TV), very high-speed internet, and very high-speed bi-directional data services (like video-on-demand (VOD), Game etc.) strategy for Fiber To The Home (FTTH) is presented. The potential market is analyzed. The barriers and some proper strategy are also discussed. Several technical problems like various powering methods, optical fiber cables, and different network architecture are discussed too.
Cortical Specializations Underlying Fast Computations
Volgushev, Maxim
2016-01-01
The time course of behaviorally relevant environmental events sets temporal constraints on neuronal processing. How does the mammalian brain make use of the increasingly complex networks of the neocortex, while making decisions and executing behavioral reactions within a reasonable time? The key parameter determining the speed of computations in neuronal networks is a time interval that neuronal ensembles need to process changes at their input and communicate results of this processing to downstream neurons. Theoretical analysis identified basic requirements for fast processing: use of neuronal populations for encoding, background activity, and fast onset dynamics of action potentials in neurons. Experimental evidence shows that populations of neocortical neurons fulfil these requirements. Indeed, they can change firing rate in response to input perturbations very quickly, within 1 to 3 ms, and encode high-frequency components of the input by phase-locking their spiking to frequencies up to 300 to 1000 Hz. This implies that time unit of computations by cortical ensembles is only few, 1 to 3 ms, which is considerably faster than the membrane time constant of individual neurons. The ability of cortical neuronal ensembles to communicate on a millisecond time scale allows for complex, multiple-step processing and precise coordination of neuronal activity in parallel processing streams, while keeping the speed of behavioral reactions within environmentally set temporal constraints. PMID:25689988
High-speed prediction of crystal structures for organic molecules
NASA Astrophysics Data System (ADS)
Obata, Shigeaki; Goto, Hitoshi
2015-02-01
We developed a master-worker type parallel algorithm for allocating tasks of crystal structure optimizations to distributed compute nodes, in order to improve a performance of simulations for crystal structure predictions. The performance experiments were demonstrated on TUT-ADSIM supercomputer system (HITACHI HA8000-tc/HT210). The experimental results show that our parallel algorithm could achieve speed-ups of 214 and 179 times using 256 processor cores on crystal structure optimizations in predictions of crystal structures for 3-aza-bicyclo(3.3.1)nonane-2,4-dione and 2-diazo-3,5-cyclohexadiene-1-one, respectively. We expect that this parallel algorithm is always possible to reduce computational costs of any crystal structure predictions.
A Study of Quality of Service Communication for High-Speed Packet-Switching Computer Sub-Networks
NASA Technical Reports Server (NTRS)
Cui, Zhenqian
1999-01-01
With the development of high-speed networking technology, computer networks, including local-area networks (LANs), wide-area networks (WANs) and the Internet, are extending their traditional roles of carrying computer data. They are being used for Internet telephony, multimedia applications such as conferencing and video on demand, distributed simulations, and other real-time applications. LANs are even used for distributed real-time process control and computing as a cost-effective approach. Differing from traditional data transfer, these new classes of high-speed network applications (video, audio, real-time process control, and others) are delay sensitive. The usefulness of data depends not only on the correctness of received data, but also the time that data are received. In other words, these new classes of applications require networks to provide guaranteed services or quality of service (QoS). Quality of service can be defined by a set of parameters and reflects a user's expectation about the underlying network's behavior. Traditionally, distinct services are provided by different kinds of networks. Voice services are provided by telephone networks, video services are provided by cable networks, and data transfer services are provided by computer networks. A single network providing different services is called an integrated-services network.
Distributed Finite Element Analysis Using a Transputer Network
NASA Technical Reports Server (NTRS)
Watson, James; Favenesi, James; Danial, Albert; Tombrello, Joseph; Yang, Dabby; Reynolds, Brian; Turrentine, Ronald; Shephard, Mark; Baehmann, Peggy
1989-01-01
The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the $80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the $15,000,000 Cray X-MP24 system.
NASA Astrophysics Data System (ADS)
Sheynin, Yuriy; Shutenko, Felix; Suvorova, Elena; Yablokov, Evgenej
2008-04-01
High rate interconnections are important subsystems in modern data processing and control systems of many classes. They are especially important in prospective embedded and on-board systems that used to be multicomponent systems with parallel or distributed architecture, [1]. Modular architecture systems of previous generations were based on parallel busses that were widely used and standardised: VME, PCI, CompactPCI, etc. Busses evolution went in improvement of bus protocol efficiency (burst transactions, split transactions, etc.) and increasing operation frequencies. However, due to multi-drop bus nature and multi-wire skew problems the parallel bussing speedup became more and more limited. For embedded and on-board systems additional reason for this trend was in weight, size and power constraints of an interconnection and its components. Parallel interfaces have become technologically more challenging as their respective clock frequencies have increased to keep pace with the bandwidth requirements of their attached storage devices. Since each interface uses a data clock to gate and validate the parallel data (which is normally 8 bits or 16 bits wide), the clock frequency need only be equivalent to the byte rate or word rate being transmitted. In other words, for a given transmission frequency, the wider the data bus, the slower the clock. As the clock frequency increases, more high frequency energy is available in each of the data lines, and a portion of this energy is dissipated in radiation. Each data line not only transmits this energy but also receives some from its neighbours. This form of mutual interference is commonly called "cross-talk," and the signal distortion it produces can become another major contributor to loss of data integrity unless compensated by appropriate cable designs. Other transmission problems such as frequency-dependent attenuation and signal reflections, while also applicable to serial interfaces, are more troublesome in parallel interfaces due to the number of additional cable conductors involved. In order to compensate for these drawbacks, higher quality cables, shorter cable runs and fewer devices on the bus have been the norm. Finally, the physical bulk of the parallel cables makes them more difficult to route inside an enclosure, hinders cooling airflow and is incompatible with the trend toward smaller form-factor devices. Parallel busses worked in systems during the past 20 years, but the accumulated problems dictate the need for change and the technology is available to spur the transition. The general trend in high-rate interconnections turned from parallel bussing to scalable interconnections with a network architecture and high-rate point-to-point links. Analysis showed that data links with serial information transfer could achieve higher throughput and efficiency and it was confirmed in various research and practical design. Serial interfaces offer an improvement over older parallel interfaces: better performance, better scalability, and also better reliability as the parallel interfaces are at their limits of speed with reliable data transfers and others. The trend was implemented in major standards' families evolution: e.g. from PCI/PCI-X parallel bussing to PCIExpress interconnection architecture with serial lines, from CompactPCI parallel bus to ATCA (Advanced Telecommunications Architecture) specification with serial links and network topologies of an interconnection, etc. In the article we consider a general set of characteristics and features of serial interconnections, give a brief overview of serial interconnections specifications. In more details we present the SpaceWire interconnection technology. Have been developed for space on-board systems applications the SpaceWire has important features and characteristics that make it a prospective interconnection for wide range of embedded systems.
Bio-Inspired Neural Model for Learning Dynamic Models
NASA Technical Reports Server (NTRS)
Duong, Tuan; Duong, Vu; Suri, Ronald
2009-01-01
A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be "hardware-friendly" in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.
ERIC Educational Resources Information Center
General Accounting Office, Washington, DC. Information Management and Technology Div.
This report was prepared in response to a request from the Senate Committee on Commerce, Science, and Transportation, and from the House Committee on Science, Space, and Technology, for information on efforts to develop high-speed computer networks in the United States, Europe (limited to France, Germany, Italy, the Netherlands, and the United…
Peregrine System Configuration | High-Performance Computing | NREL
nodes and storage are connected by a high speed InfiniBand network. Compute nodes are diskless with an directories are mounted on all nodes, along with a file system dedicated to shared projects. A brief processors with 64 GB of memory. All nodes are connected to the high speed Infiniband network and and a
Flow distribution in parallel microfluidic networks and its effect on concentration gradient
Guermonprez, Cyprien; Michelin, Sébastien; Baroud, Charles N.
2015-01-01
The architecture of microfluidic networks can significantly impact the flow distribution within its different branches and thereby influence tracer transport within the network. In this paper, we study the flow rate distribution within a network of parallel microfluidic channels with a single input and single output, using a combination of theoretical modeling and microfluidic experiments. Within the ladder network, the flow rate distribution follows a U-shaped profile, with the highest flow rate occurring in the initial and final branches. The contrast with the central branches is controlled by a single dimensionless parameter, namely, the ratio of hydrodynamic resistance between the distribution channel and the side branches. This contrast in flow rates decreases when the resistance of the side branches increases relative to the resistance of the distribution channel. When the inlet flow is composed of two parallel streams, one of which transporting a diffusing species, a concentration variation is produced within the side branches of the network. The shape of this concentration gradient is fully determined by two dimensionless parameters: the ratio of resistances, which determines the flow rate distribution, and the Péclet number, which characterizes the relative speed of diffusion and advection. Depending on the values of these two control parameters, different distribution profiles can be obtained ranging from a flat profile to a step distribution of solute, with well-distributed gradients between these two limits. Our experimental results are in agreement with our numerical model predictions, based on a simplified 2D advection-diffusion problem. Finally, two possible applications of this work are presented: the first one combines the present design with self-digitization principle to encapsulate the controlled concentration in nanoliter chambers, while the second one extends the present design to create a continuous concentration gradient within an open flow chamber. PMID:26487905
A distributed version of the NASA Engine Performance Program
NASA Technical Reports Server (NTRS)
Cours, Jeffrey T.; Curlett, Brian P.
1993-01-01
Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis
Schulz, Martin; Galarowicz, Jim; Maghrak, Don; ...
2008-01-01
Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering themore » most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.« less
The development speed paradox: can increasing development speed reduce R&D productivity?
Lendrem, Dennis W; Lendrem, B Clare
2014-03-01
In the 1990s the pharmaceutical industry sought to increase R&D productivity by shifting development tasks into parallel to reduce development cycle times and increase development speed. This paper presents a simple model demonstrating that, when attrition rates are high as in pharmaceutical development, such development speed initiatives can increase the expected time for the first successful molecule to complete development. Increasing the development speed of successful molecules could actually reduce R&D productivity - the development speed paradox. Copyright © 2013 Elsevier Ltd. All rights reserved.
Integrated Service Provisioning in an Ipv6 over ATM Research Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eli Dart; Helen Chen; Jerry Friesen
1999-02-01
During the past few years, the worldwide Internet has grown at a phenomenal rate, which has spurred the proposal of innovative network technologies to support the fast, efficient and low-latency transport of a wide spectrum of multimedia traffic types. Existing network infrastructures have been plagued by their inability to provide for real-time application traffic as well as their general lack of resources and resilience to congestion. This work proposes to address these issues by implementing a prototype high-speed network infrastructure consisting of Internet Protocol Version 6 (IPv6) on top of an Asynchronous Transfer Mode (ATM) transport medium. Since ATM ismore » connection-oriented whereas IP uses a connection-less paradigm, the efficient integration of IPv6 over ATM is especially challenging and has generated much interest in the research community. We propose, in collaboration with an industry partner, to implement IPv6 over ATM using a unique approach that integrates IP over fast A TM hardware while still preserving IP's connection-less paradigm. This is achieved by replacing ATM's control software with IP's routing code and by caching IP's forwarding decisions in ATM's VPI/VCI translation tables. Prototype ''VR'' and distributed-parallel-computing applications will also be developed to exercise the realtime capability of our IPv6 over ATM network.« less
SEAL FOR HIGH SPEED CENTRIFUGE
Skarstrom, C.W.
1957-12-17
A seal is described for a high speed centrifuge wherein the centrifugal force of rotation acts on the gasket to form a tight seal. The cylindrical rotating bowl of the centrifuge contains a closure member resting on a shoulder in the bowl wall having a lower surface containing bands of gasket material, parallel and adjacent to the cylinder wall. As the centrifuge speed increases, centrifugal force acts on the bands of gasket material forcing them in to a sealing contact against the cylinder wall. This arrangememt forms a simple and effective seal for high speed centrifuges, replacing more costly methods such as welding a closure in place.
High speed infrared imaging system and method
Zehnder, Alan T.; Rosakis, Ares J.; Ravichandran, G.
2001-01-01
A system and method for radiation detection with an increased frame rate. A semi-parallel processing configuration is used to process a row or column of pixels in a focal-plane array in parallel to achieve a processing rate up to and greater than 1 million frames per second.
Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni
2011-05-04
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.
NASA Astrophysics Data System (ADS)
Meng, Qizhi; Xie, Fugui; Liu, Xin-Jun
2018-06-01
This paper deals with the conceptual design, kinematic analysis and workspace identification of a novel four degrees-of-freedom (DOFs) high-speed spatial parallel robot for pick-and-place operations. The proposed spatial parallel robot consists of a base, four arms and a 1½ mobile platform. The mobile platform is a major innovation that avoids output singularity and offers the advantages of both single and double platforms. To investigate the characteristics of the robot's DOFs, a line graph method based on Grassmann line geometry is adopted in mobility analysis. In addition, the inverse kinematics is derived, and the constraint conditions to identify the correct solution are also provided. On the basis of the proposed concept, the workspace of the robot is identified using a set of presupposed parameters by taking input and output transmission index as the performance evaluation criteria.
High speed fiber optics local area networks: Design and implementation
NASA Technical Reports Server (NTRS)
Tobagi, Fouad A.
1988-01-01
The design of high speed local area networks (HSLAN) for communication among distributed devices requires solving problems in three areas: (1) the network medium and its topology; (2) the medium access control; and (3) the network interface. Considerable progress has been made in all areas. Accomplishments are divided into two groups according to their theoretical or experimental nature. A brief summary is given in Section 2, including references to papers which appeared in the literature, as well as to Ph.D. dissertations and technical reports published at Stanford University.
DOT National Transportation Integrated Search
2014-07-01
In the past, U.S. studies on high-speed rail (HSR) have focused primarily on the economic implications of high-speed rail development. Recently, however, studies have begun evaluating multimodal connectivity of HSR stations. The ways in which differe...
Field trials of 100G and beyond: an operator's point of view
NASA Astrophysics Data System (ADS)
Vorbeck, S.; Schneiders, M.; Weiershausen, W.; Mayer, H.; Schippel, A.; Wagner, P.; Ehrhardt, A.; Braun, R.; Breuer, D.; Drafz, U.; Fritzsche, D.
2011-01-01
In this article we present a summary of the latest 100 Gbps field trials in the network of Deutsche Telekom AG with industry partners. We cover a brown field approach as alien wavelength on existing systems, a green field high speed overlay network approach and a high speed interface router-router coupling.
Research of x-ray nondestructive detector for high-speed running conveyor belt with steel wire ropes
NASA Astrophysics Data System (ADS)
Wang, Junfeng; Miao, Changyun; Wang, Wei; Lu, Xiaocui
2008-03-01
An X-ray nondestructive detector for high-speed running conveyor belt with steel wire ropes is researched in the paper. The principle of X-ray nondestructive testing (NDT) is analyzed, the general scheme of the X-ray nondestructive testing system is proposed, and the nondestructive detector for high-speed running conveyor belt with steel wire ropes is developed. The hardware of system is designed with Xilinx's VIRTEX-4 FPGA that embeds PowerPC and MAC IP core, and its network communication software based on TCP/IP protocol is programmed by loading LwIP to PowerPC. The nondestructive testing of high-speed conveyor belt with steel wire ropes and network transfer function are implemented. It is a strong real-time system with rapid scanning speed, high reliability and remotely nondestructive testing function. The nondestructive detector can be applied to the detection of product line in industry.
Distributed computations in a dynamic, heterogeneous Grid environment
NASA Astrophysics Data System (ADS)
Dramlitsch, Thomas
2003-06-01
In order to face the rapidly increasing need for computational resources of various scientific and engineering applications one has to think of new ways to make more efficient use of the worlds current computational resources. In this respect, the growing speed of wide area networks made a new kind of distributed computing possible: Metacomputing or (distributed) Grid computing. This is a rather new and uncharted field in computational science. The rapidly increasing speed of networks even outperforms the average increase of processor speed: Processor speeds double on average each 18 month whereas network bandwidths double every 9 months. Due to this development of local and wide area networks Grid computing will certainly play a key role in the future of parallel computing. This type of distributed computing, however, distinguishes from the traditional parallel computing in many ways since it has to deal with many problems not occurring in classical parallel computing. Those problems are for example heterogeneity, authentication and slow networks to mention only a few. Some of those problems, e.g. the allocation of distributed resources along with the providing of information about these resources to the application have been already attacked by the Globus software. Unfortunately, as far as we know, hardly any application or middle-ware software takes advantage of this information, since most parallelizing algorithms for finite differencing codes are implicitly designed for single supercomputer or cluster execution. We show that although it is possible to apply classical parallelizing algorithms in a Grid environment, in most cases the observed efficiency of the executed code is very poor. In this work we are closing this gap. In our thesis, we will - show that an execution of classical parallel codes in Grid environments is possible but very slow - analyze this situation of bad performance, nail down bottlenecks in communication, remove unnecessary overhead and other reasons for low performance - develop new and advanced algorithms for parallelisation that are aware of a Grid environment in order to generelize the traditional parallelization schemes - implement and test these new methods, replace and compare with the classical ones - introduce dynamic strategies that automatically adapt the running code to the nature of the underlying Grid environment. The higher the performance one can achieve for a single application by manual tuning for a Grid environment, the lower the chance that those changes are widely applicable to other programs. In our analysis as well as in our implementation we tried to keep the balance between high performance and generality. None of our changes directly affect code on the application level which makes our algorithms applicable to a whole class of real world applications. The implementation of our work is done within the Cactus framework using the Globus toolkit, since we think that these are the most reliable and advanced programming frameworks for supporting computations in Grid environments. On the other hand, however, we tried to be as general as possible, i.e. all methods and algorithms discussed in this thesis are independent of Cactus or Globus. Die immer dichtere und schnellere Vernetzung von Rechnern und Rechenzentren über Hochgeschwindigkeitsnetzwerke ermöglicht eine neue Art des wissenschaftlich verteilten Rechnens, bei der geographisch weit auseinanderliegende Rechenkapazitäten zu einer Gesamtheit zusammengefasst werden können. Dieser so entstehende virtuelle Superrechner, der selbst aus mehreren Grossrechnern besteht, kann dazu genutzt werden Probleme zu berechnen, für die die einzelnen Grossrechner zu klein sind. Die Probleme, die numerisch mit heutigen Rechenkapazitäten nicht lösbar sind, erstrecken sich durch sämtliche Gebiete der heutigen Wissenschaft, angefangen von Astrophysik, Molekülphysik, Bioinformatik, Meteorologie, bis hin zur Zahlentheorie und Fluiddynamik um nur einige Gebiete zu nennen. Je nach Art der Problemstellung und des Lösungsverfahrens gestalten sich solche "Meta-Berechnungen" mehr oder weniger schwierig. Allgemein kann man sagen, dass solche Berechnungen um so schwerer und auch um so uneffizienter werden, je mehr Kommunikation zwischen den einzelnen Prozessen (oder Prozessoren) herrscht. Dies ist dadurch begründet, dass die Bandbreiten bzw. Latenzzeiten zwischen zwei Prozessoren auf demselben Grossrechner oder Cluster um zwei bis vier Grössenordnungen höher bzw. niedriger liegen als zwischen Prozessoren, welche hunderte von Kilometern entfernt liegen. Dennoch bricht nunmehr eine Zeit an, in der es möglich ist Berechnungen auf solch virtuellen Supercomputern auch mit kommunikationsintensiven Programmen durchzuführen. Eine grosse Klasse von kommunikations- und berechnungsintensiven Programmen ist diejenige, die die Lösung von Differentialgleichungen mithilfe von finiten Differenzen zum Inhalt hat. Gerade diese Klasse von Programmen und deren Betrieb in einem virtuellen Superrechner wird in dieser vorliegenden Dissertation behandelt. Methoden zur effizienteren Durchführung von solch verteilten Berechnungen werden entwickelt, analysiert und implementiert. Der Schwerpunkt liegt darin vorhandene, klassische Parallelisierungsalgorithmen zu analysieren und so zu erweitern, dass sie vorhandene Informationen (z.B. verfügbar durch das Globus Toolkit) über Maschinen und Netzwerke zur effizienteren Parallelisierung nutzen. Soweit wir wissen werden solche Zusatzinformationen kaum in relevanten Programmen genutzt, da der Grossteil aller Parallelisierungsalgorithmen implizit für die Ausführung auf Grossrechnern oder Clustern entwickelt wurde.
Parallel fuzzy connected image segmentation on GPU
Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.
2011-01-01
Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037
Parallel fuzzy connected image segmentation on GPU.
Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W
2011-07-01
Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.
Parallel Rendering of Large Time-Varying Volume Data
NASA Technical Reports Server (NTRS)
Garbutt, Alexander E.
2005-01-01
Interactive visualization of large time-varying 3D volume datasets has been and still is a great challenge to the modem computational world. It stretches the limits of the memory capacity, the disk space, the network bandwidth and the CPU speed of a conventional computer. In this SURF project, we propose to develop a parallel volume rendering program on SGI's Prism, a cluster computer equipped with state-of-the-art graphic hardware. The proposed program combines both parallel computing and hardware rendering in order to achieve an interactive rendering rate. We use 3D texture mapping and a hardware shader to implement 3D volume rendering on each workstation. We use SGI's VisServer to enable remote rendering using Prism's graphic hardware. And last, we will integrate this new program with ParVox, a parallel distributed visualization system developed at JPL. At the end of the project, we Will demonstrate remote interactive visualization using this new hardware volume renderer on JPL's Prism System using a time-varying dataset from selected JPL applications.
Zheng, Chenguang; Bieri, Kevin Wood; Trettel, Sean Gregory; Colgin, Laura Lee
2015-01-01
In hippocampal area CA1 of rats, the frequency of gamma activity has been shown to increase with running speed (Ahmed and Mehta, 2012). This finding suggests that different gamma frequencies simply allow for different timings of transitions across cell assemblies at varying running speeds, rather than serving unique functions. However, accumulating evidence supports the conclusion that slow (~25–55 Hz) and fast (~60–100 Hz) gamma are distinct network states with different functions. If slow and fast gamma constitute distinct network states, then it is possible that slow and fast gamma frequencies are differentially affected by running speed. In this study, we tested this hypothesis and found that slow and fast gamma frequencies change differently as a function of running speed in hippocampal areas CA1 and CA3, and in the superficial layers of the medial entorhinal cortex (MEC). Fast gamma frequencies increased with increasing running speed in all three areas. Slow gamma frequencies changed significantly less across different speeds. Furthermore, at high running speeds, CA3 firing rates were low, and MEC firing rates were high, suggesting that CA1 transitions from CA3 inputs to MEC inputs as running speed increases. These results support the hypothesis that slow and fast gamma reflect functionally distinct states in the hippocampal network, with fast gamma driven by MEC at high running speeds and slow gamma driven by CA3 at low running speeds. PMID:25601003
A comparison study between MLP and convolutional neural network models for character recognition
NASA Astrophysics Data System (ADS)
Ben Driss, S.; Soua, M.; Kachouri, R.; Akil, M.
2017-05-01
Optical Character Recognition (OCR) systems have been designed to operate on text contained in scanned documents and images. They include text detection and character recognition in which characters are described then classified. In the classification step, characters are identified according to their features or template descriptions. Then, a given classifier is employed to identify characters. In this context, we have proposed the unified character descriptor (UCD) to represent characters based on their features. Then, matching was employed to ensure the classification. This recognition scheme performs a good OCR Accuracy on homogeneous scanned documents, however it cannot discriminate characters with high font variation and distortion.3 To improve recognition, classifiers based on neural networks can be used. The multilayer perceptron (MLP) ensures high recognition accuracy when performing a robust training. Moreover, the convolutional neural network (CNN), is gaining nowadays a lot of popularity for its high performance. Furthermore, both CNN and MLP may suffer from the large amount of computation in the training phase. In this paper, we establish a comparison between MLP and CNN. We provide MLP with the UCD descriptor and the appropriate network configuration. For CNN, we employ the convolutional network designed for handwritten and machine-printed character recognition (Lenet-5) and we adapt it to support 62 classes, including both digits and characters. In addition, GPU parallelization is studied to speed up both of MLP and CNN classifiers. Based on our experimentations, we demonstrate that the used real-time CNN is 2x more relevant than MLP when classifying characters.
Reliable, Memory Speed Storage for Cluster Computing Frameworks
2014-06-16
specification API that can capture computations in many of today’s popular data -parallel computing models, e.g., MapReduce and SQL. We also ported the Hadoop ...today’s big data workloads: • Immutable data : Data is immutable once written, since dominant underlying storage systems, such as HDFS [3], only support...network transfers, so reads can be data -local. • Program size vs. data size: In big data processing, the same operation is repeatedly applied on massive
New technique for real-time distortion-invariant multiobject recognition and classification
NASA Astrophysics Data System (ADS)
Hong, Rutong; Li, Xiaoshun; Hong, En; Wang, Zuyi; Wei, Hongan
2001-04-01
A real-time hybrid distortion-invariant OPR system was established to make 3D multiobject distortion-invariant automatic pattern recognition. Wavelet transform technique was used to make digital preprocessing of the input scene, to depress the noisy background and enhance the recognized object. A three-layer backpropagation artificial neural network was used in correlation signal post-processing to perform multiobject distortion-invariant recognition and classification. The C-80 and NOA real-time processing ability and the multithread programming technology were used to perform high speed parallel multitask processing and speed up the post processing rate to ROIs. The reference filter library was constructed for the distortion version of 3D object model images based on the distortion parameter tolerance measuring as rotation, azimuth and scale. The real-time optical correlation recognition testing of this OPR system demonstrates that using the preprocessing, post- processing, the nonlinear algorithm os optimum filtering, RFL construction technique and the multithread programming technology, a high possibility of recognition and recognition rate ere obtained for the real-time multiobject distortion-invariant OPR system. The recognition reliability and rate was improved greatly. These techniques are very useful to automatic target recognition.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Highly survivable bed pressure mat remote patient monitoring system for mHealth.
Joshi, Vilas; Holtzman, Megan; Arcelus, Amaya; Goubran, Rafik; Knoefel, Frank
2012-01-01
The high speed mobile networks like 4G and beyond are making a ubiquitous remote patient monitoring (RPM) system using multiple sensors and wireless sensor networks a realistic possibility. The high speed wireless RPM system will be an integral part of the mobile health (mHealth) paradigm reducing cost and providing better service to the patients. While the high speed wireless RPM system will allow clinicians to monitor various chronic and acute medical conditions, the reliability of such system will depend on the network Quality of Service (QoS). The RPM system needs to be resilient to temporary reduced network QoS. This paper presents a highly survivable bed pressure mat RPM system design using an adaptive information content management methodology for the monitored sensor data. The proposed design improves the resiliency of the RPM system under adverse network conditions like congestion and/or temporary loss of connectivity. It also shows how the proposed RPM system can reduce the information rate and correspondingly reduce the data transfer rate by a factor of 5.5 and 144 to address temporary network congestion. The RPM system data rate reduction results in a lower specificity and sensitivity for the features being monitored but increases the survivability of the system from 1 second to 2.4 minutes making it highly robust.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Bailey, David (Technical Monitor)
1997-01-01
While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Experiments with a Parallel Multi-Objective Evolutionary Algorithm for Scheduling
NASA Technical Reports Server (NTRS)
Brown, Matthew; Johnston, Mark D.
2013-01-01
Evolutionary multi-objective algorithms have great potential for scheduling in those situations where tradeoffs among competing objectives represent a key requirement. One challenge, however, is runtime performance, as a consequence of evolving not just a single schedule, but an entire population, while attempting to sample the Pareto frontier as accurately and uniformly as possible. The growing availability of multi-core processors in end user workstations, and even laptops, has raised the question of the extent to which such hardware can be used to speed up evolutionary algorithms. In this paper we report on early experiments in parallelizing a Generalized Differential Evolution (GDE) algorithm for scheduling long-range activities on NASA's Deep Space Network. Initial results show that significant speedups can be achieved, but that performance does not necessarily improve as more cores are utilized. We describe our preliminary results and some initial suggestions from parallelizing the GDE algorithm. Directions for future work are outlined.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1996-01-01
We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms
NASA Technical Reports Server (NTRS)
Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.
1997-01-01
We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology
NASA Astrophysics Data System (ADS)
Macioł, Piotr; Michalik, Kazimierz
2016-10-01
Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.
The Portals 4.0 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2012-11-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
ERIC Educational Resources Information Center
Gow, David W., Jr.; Keller, Corey J.; Eskandar, Emad; Meng, Nate; Cash, Sydney S.
2009-01-01
In this work, we apply Granger causality analysis to high spatiotemporal resolution intracranial EEG (iEEG) data to examine how different components of the left perisylvian language network interact during spoken language perception. The specific focus is on the characterization of serial versus parallel processing dependencies in the dominant…
Implementing Access to Data Distributed on Many Processors
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
A reference architecture is defined for an object-oriented implementation of domains, arrays, and distributions written in the programming language Chapel. This technology primarily addresses domains that contain arrays that have regular index sets with the low-level implementation details being beyond the scope of this discussion. What is defined is a complete set of object-oriented operators that allows one to perform data distributions for domain arrays involving regular arithmetic index sets. What is unique is that these operators allow for the arbitrary regions of the arrays to be fragmented and distributed across multiple processors with a single point of access giving the programmer the illusion that all the elements are collocated on a single processor. Today's massively parallel High Productivity Computing Systems (HPCS) are characterized by a modular structure, with a large number of processing and memory units connected by a high-speed network. Locality of access as well as load balancing are primary concerns in these systems that are typically used for high-performance scientific computation. Data distributions address these issues by providing a range of methods for spreading large data sets across the components of a system. Over the past two decades, many languages, systems, tools, and libraries have been developed for the support of distributions. Since the performance of data parallel applications is directly influenced by the distribution strategy, users often resort to low-level programming models that allow fine-tuning of the distribution aspects affecting performance, but, at the same time, are tedious and error-prone. This technology presents a reusable design of a data-distribution framework for data parallel high-performance applications. Distributions are a means to express locality in systems composed of large numbers of processor and memory components connected by a network. Since distributions have a great effect on the performance of applications, it is important that the distribution strategy is flexible, so its behavior can change depending on the needs of the application. At the same time, high productivity concerns require that the user be shielded from error-prone, tedious details such as communication and synchronization.
A neuro-inspired spike-based PID motor controller for multi-motor robots with low cost FPGAs.
Jimenez-Fernandez, Angel; Jimenez-Moreno, Gabriel; Linares-Barranco, Alejandro; Dominguez-Morales, Manuel J; Paz-Vicente, Rafael; Civit-Balcells, Anton
2012-01-01
In this paper we present a neuro-inspired spike-based close-loop controller written in VHDL and implemented for FPGAs. This controller has been focused on controlling a DC motor speed, but only using spikes for information representation, processing and DC motor driving. It could be applied to other motors with proper driver adaptation. This controller architecture represents one of the latest layers in a Spiking Neural Network (SNN), which implements a bridge between robotics actuators and spike-based processing layers and sensors. The presented control system fuses actuation and sensors information as spikes streams, processing these spikes in hard real-time, implementing a massively parallel information processing system, through specialized spike-based circuits. This spike-based close-loop controller has been implemented into an AER platform, designed in our labs, that allows direct control of DC motors: the AER-Robot. Experimental results evidence the viability of the implementation of spike-based controllers, and hardware synthesis denotes low hardware requirements that allow replicating this controller in a high number of parallel controllers working together to allow a real-time robot control.
A Neuro-Inspired Spike-Based PID Motor Controller for Multi-Motor Robots with Low Cost FPGAs
Jimenez-Fernandez, Angel; Jimenez-Moreno, Gabriel; Linares-Barranco, Alejandro; Dominguez-Morales, Manuel J.; Paz-Vicente, Rafael; Civit-Balcells, Anton
2012-01-01
In this paper we present a neuro-inspired spike-based close-loop controller written in VHDL and implemented for FPGAs. This controller has been focused on controlling a DC motor speed, but only using spikes for information representation, processing and DC motor driving. It could be applied to other motors with proper driver adaptation. This controller architecture represents one of the latest layers in a Spiking Neural Network (SNN), which implements a bridge between robotics actuators and spike-based processing layers and sensors. The presented control system fuses actuation and sensors information as spikes streams, processing these spikes in hard real-time, implementing a massively parallel information processing system, through specialized spike-based circuits. This spike-based close-loop controller has been implemented into an AER platform, designed in our labs, that allows direct control of DC motors: the AER-Robot. Experimental results evidence the viability of the implementation of spike-based controllers, and hardware synthesis denotes low hardware requirements that allow replicating this controller in a high number of parallel controllers working together to allow a real-time robot control. PMID:22666004
Afshar, Yaser; Sbalzarini, Ivo F.
2016-01-01
Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144
Afshar, Yaser; Sbalzarini, Ivo F
2016-01-01
Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.
Sequential and parallel image restoration: neural network implementations.
Figueiredo, M T; Leitao, J N
1994-01-01
Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.
A Latency-Tolerant Partitioner for Distributed Computing on the Information Power Grid
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biwas, Rupak; Kwak, Dochan (Technical Monitor)
2001-01-01
NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of graphically distributed computers, databases, and human expertise, in order to solve large-scale realistic computational problems. This type of a meta-computing environment is necessary to present a unified virtual machine to application developers that hides the intricacies of a highly heterogeneous environment and yet maintains adequate security. In this paper, we present a novel partitioning scheme. called MinEX, that dynamically balances processor workloads while minimizing data movement and runtime communication, for applications that are executed in a parallel distributed fashion on the IPG. We also analyze the conditions that are required for the IPG to be an effective tool for such distributed computations. Our results show that MinEX is a viable load balancer provided the nodes of the IPG are connected by a high-speed asynchronous interconnection network.
NASA Technical Reports Server (NTRS)
Kirk, R. G.; Nicholas, J. C.; Donald, G. H.; Murphy, R. C.
1980-01-01
The summary of a complete analytical design evaluation of an existing parallel flow compressor is presented and a field vibration problem that manifested itself as a subsynchronous vibration that tracked at approximately 2/3 of compressor speed is reviewed. The comparison of predicted and observed peak response speeds, frequency spectrum content, and the performance of the bearing-seal systems are presented as the events of the field problem are reviewed. Conclusions and recommendations are made as to the degree of accuracy of the analytical techniques used to evaluate the compressor design.
Parallelism in Manipulator Dynamics. Revision.
1983-12-01
computing the motor torques required to drive a lower-pair kinematic chain (e.g., a typical manipulator arm in free motion, or a mechanical leg in the... computations , and presents two "mathematically exact" formulationsespecially suited to high-speed, highly parallel implementa- tions using special-purpose...YNAMICS by I(IIAR) IIAROLI) LATIROP .4ISTRACT This paper addresses the problem of efficiently computing the motor torques required to drive a lower-pair
Fast and Famous: Looking for the Fastest Speed at Which a Face Can be Recognized
Barragan-Jason, Gladys; Besson, Gabriel; Ceccaldi, Mathieu; Barbeau, Emmanuel J.
2012-01-01
Face recognition is supposed to be fast. However, the actual speed at which faces can be recognized remains unknown. To address this issue, we report two experiments run with speed constraints. In both experiments, famous faces had to be recognized among unknown ones using a large set of stimuli to prevent pre-activation of features which would speed up recognition. In the first experiment (31 participants), recognition of famous faces was investigated using a rapid go/no-go task. In the second experiment, 101 participants performed a highly time constrained recognition task using the Speed and Accuracy Boosting procedure. Results indicate that the fastest speed at which a face can be recognized is around 360–390 ms. Such latencies are about 100 ms longer than the latencies recorded in similar tasks in which subjects have to detect faces among other stimuli. We discuss which model of activation of the visual ventral stream could account for such latencies. These latencies are not consistent with a purely feed-forward pass of activity throughout the visual ventral stream. An alternative is that face recognition relies on the core network underlying face processing identified in fMRI studies (OFA, FFA, and pSTS) and reentrant loops to refine face representation. However, the model of activation favored is that of an activation of the whole visual ventral stream up to anterior areas, such as the perirhinal cortex, combined with parallel and feed-back processes. Further studies are needed to assess which of these three models of activation can best account for face recognition. PMID:23460051
Peak holding circuit for extremely narrow pulses
NASA Technical Reports Server (NTRS)
Oneill, R. W. (Inventor)
1975-01-01
An improved pulse stretching circuit comprising: a high speed wide-band amplifier connected in a fast charge integrator configuration; a holding circuit including a capacitor connected in parallel with a discharging network which employs a resistor and an FET; and an output buffer amplifier. Input pulses of very short duration are applied to the integrator charging the capacitor to a value proportional to the input pulse amplitude. After a predetermined period of time, conventional circuitry generates a dump pulse which is applied to the gate of the FET making a low resistance path to ground which discharges the capacitor. When the dump pulse terminates, the circuit is ready to accept another pulse to be stretched. The very short input pulses are thus stretched in width so that they may be analyzed by conventional pulse height analyzers.
Ad hoc Laser networks component technology for modular spacecraft
NASA Astrophysics Data System (ADS)
Huang, Xiujun; Shi, Dele; Ma, Zongfeng; Shen, Jingshi
2016-03-01
Distributed reconfigurable satellite is a new kind of spacecraft system, which is based on a flexible platform of modularization and standardization. Based on the module data flow analysis of the spacecraft, this paper proposes a network component of ad hoc Laser networks architecture. Low speed control network with high speed load network of Microwave-Laser communication mode, no mesh network mode, to improve the flexibility of the network. Ad hoc Laser networks component technology was developed, and carried out the related performance testing and experiment. The results showed that ad hoc Laser networks components can meet the demand of future networking between the module of spacecraft.
Ad hoc laser networks component technology for modular spacecraft
NASA Astrophysics Data System (ADS)
Huang, Xiujun; Shi, Dele; Shen, Jingshi
2017-10-01
Distributed reconfigurable satellite is a new kind of spacecraft system, which is based on a flexible platform of modularization and standardization. Based on the module data flow analysis of the spacecraft, this paper proposes a network component of ad hoc Laser networks architecture. Low speed control network with high speed load network of Microwave-Laser communication mode, no mesh network mode, to improve the flexibility of the network. Ad hoc Laser networks component technology was developed, and carried out the related performance testing and experiment. The results showed that ad hoc Laser networks components can meet the demand of future networking between the module of spacecraft.
Accurate modeling of switched reluctance machine based on hybrid trained WNN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Shoujun, E-mail: sunnyway@nwpu.edu.cn; Ge, Lefei; Ma, Shaojie
2014-04-15
According to the strong nonlinear electromagnetic characteristics of switched reluctance machine (SRM), a novel accurate modeling method is proposed based on hybrid trained wavelet neural network (WNN) which combines improved genetic algorithm (GA) with gradient descent (GD) method to train the network. In the novel method, WNN is trained by GD method based on the initial weights obtained per improved GA optimization, and the global parallel searching capability of stochastic algorithm and local convergence speed of deterministic algorithm are combined to enhance the training accuracy, stability and speed. Based on the measured electromagnetic characteristics of a 3-phase 12/8-pole SRM, themore » nonlinear simulation model is built by hybrid trained WNN in Matlab. The phase current and mechanical characteristics from simulation under different working conditions meet well with those from experiments, which indicates the accuracy of the model for dynamic and static performance evaluation of SRM and verifies the effectiveness of the proposed modeling method.« less
Evaluating System Parameters on a Dragonfly using Simulation and Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bhatele, Abhinav; Jain, Nikhil; Livnat, Yarden
The dragon y topology is becoming a popular choice for build- ing high-radix, low-diameter networks with high-bandwidth links. Even with a powerful network, preliminary experi- ments on Edison at NERSC have shown that for communica- tion heavy applications, job interference and thus presumably job placement remains an important factor. In this paper, we explore the e ects of job placement, job sizes, parallel workloads and network con gurations on network through- put to better understand inter-job interference. We use a simulation tool called Damsel y to model the network be- havior of Edison and study the impact of various systemmore » parameters on network throughput. Parallel workloads based on ve representative communication patters are used and the simulation studies on up to 131,072 cores are aided by a new visualization of the dragon y network.« less
Parallel processing and expert systems
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Lau, Sonie
1991-01-01
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, H. M. Abdul; Ukkusuri, Satish V.
We present that EPA-MOVES (Motor Vehicle Emission Simulator) is often integrated with traffic simulators to assess emission levels of large-scale urban networks with signalized intersections. High variations in speed profiles exist in the context of congested urban networks with signalized intersections. The traditional average-speed-based emission estimation technique with EPA-MOVES provides faster execution while underestimates the emissions in most cases because of ignoring the speed variation at congested networks with signalized intersections. In contrast, the atomic second-by-second speed profile (i.e., the trajectory of each vehicle)-based technique provides accurate emissions at the cost of excessive computational power and time. We addressed thismore » issue by developing a novel method to determine the link-driving-schedules (LDSs) for the EPA-MOVES tool. Our research developed a hierarchical clustering technique with dynamic time warping similarity measures (HC-DTW) to find the LDS for EPA-MOVES that is capable of producing emission estimates better than the average-speed-based technique with execution time faster than the atomic speed profile approach. We applied the HC-DTW on a sample data from a signalized corridor and found that HC-DTW can significantly reduce computational time without compromising the accuracy. The developed technique in this research can substantially contribute to the EPA-MOVES-based emission estimation process for large-scale urban transportation network by reducing the computational time with reasonably accurate estimates. This method is highly appropriate for transportation networks with higher variation in speed such as signalized intersections. Lastly, experimental results show error difference ranging from 2% to 8% for most pollutants except PM 10.« less
Aziz, H. M. Abdul; Ukkusuri, Satish V.
2017-06-29
We present that EPA-MOVES (Motor Vehicle Emission Simulator) is often integrated with traffic simulators to assess emission levels of large-scale urban networks with signalized intersections. High variations in speed profiles exist in the context of congested urban networks with signalized intersections. The traditional average-speed-based emission estimation technique with EPA-MOVES provides faster execution while underestimates the emissions in most cases because of ignoring the speed variation at congested networks with signalized intersections. In contrast, the atomic second-by-second speed profile (i.e., the trajectory of each vehicle)-based technique provides accurate emissions at the cost of excessive computational power and time. We addressed thismore » issue by developing a novel method to determine the link-driving-schedules (LDSs) for the EPA-MOVES tool. Our research developed a hierarchical clustering technique with dynamic time warping similarity measures (HC-DTW) to find the LDS for EPA-MOVES that is capable of producing emission estimates better than the average-speed-based technique with execution time faster than the atomic speed profile approach. We applied the HC-DTW on a sample data from a signalized corridor and found that HC-DTW can significantly reduce computational time without compromising the accuracy. The developed technique in this research can substantially contribute to the EPA-MOVES-based emission estimation process for large-scale urban transportation network by reducing the computational time with reasonably accurate estimates. This method is highly appropriate for transportation networks with higher variation in speed such as signalized intersections. Lastly, experimental results show error difference ranging from 2% to 8% for most pollutants except PM 10.« less
Analysis of multigrid methods on massively parallel computers: Architectural implications
NASA Technical Reports Server (NTRS)
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
NASA Astrophysics Data System (ADS)
Grossman, Barry G.; Gonzalez, Frank S.; Blatt, Joel H.; Hooker, Jeffery A.
1992-03-01
The development of efficient high speed techniques to recognize, locate, and quantify damage is vitally important for successful automated inspection systems such as ones used for the inspection of undersea pipelines. Two critical problems must be solved to achieve these goals: the reduction of nonuseful information present in the video image and automatic recognition and quantification of extent and location of damage. Artificial neural network processed moire profilometry appears to be a promising technique to accomplish this. Real time video moire techniques have been developed which clearly distinguish damaged and undamaged areas on structures, thus reducing the amount of extraneous information input into an inspection system. Artificial neural networks have demonstrated advantages for image processing, since they can learn the desired response to a given input and are inherently fast when implemented in hardware due to their parallel computing architecture. Video moire images of pipes with dents of different depths were used to train a neural network, with the desired output being the location and severity of the damage. The system was then successfully tested with a second series of moire images. The techniques employed and the results obtained are discussed.
NASA Astrophysics Data System (ADS)
Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav
2017-10-01
In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
The portals 4.0.1 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2013-04-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
NASA Astrophysics Data System (ADS)
Xue, Wei; Wang, Qi; Wang, Tianyu
2018-04-01
This paper presents an improved parallel combinatory spread spectrum (PC/SS) communication system with the method of double information matching (DIM). Compared with conventional PC/SS system, the new model inherits the advantage of high transmission speed, large information capacity and high security. Besides, the problem traditional system will face is the high bit error rate (BER) and since its data-sequence mapping algorithm. Hence the new model presented shows lower BER and higher efficiency by its optimization of mapping algorithm.
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
The high speed interconnect system architecture and operation
NASA Astrophysics Data System (ADS)
Anderson, Steven C.
The design and operation of a fiber-optic high-speed interconnect system (HSIS) being developed to meet the requirements of future avionics and flight-control hardware with distributed-system architectures are discussed. The HSIS is intended for 100-Mb/s operation of a local-area network with up to 256 stations. It comprises a bus transmission system (passive star couplers and linear media linked by active elements) and network interface units (NIUs). Each NIU is designed to perform the physical, data link, network, and transport functions defined by the ISO OSI Basic Reference Model (1982 and 1983) and incorporates a fiber-optic transceiver, a high-speed protocol based on the SAE AE-9B linear token-passing data bus (1986), and a specialized application interface unit. The operating modes and capabilities of HSIS are described in detail and illustrated with diagrams.
Argonne simulation framework for intelligent transportation systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ewing, T.; Doss, E.; Hanebutte, U.
1996-04-01
A simulation framework has been developed which defines a high-level architecture for a large-scale, comprehensive, scalable simulation of an Intelligent Transportation System (ITS). The simulator is designed to run on parallel computers and distributed (networked) computer systems; however, a version for a stand alone workstation is also available. The ITS simulator includes an Expert Driver Model (EDM) of instrumented ``smart`` vehicles with in-vehicle navigation units. The EDM is capable of performing optimal route planning and communicating with Traffic Management Centers (TMC). A dynamic road map data base is sued for optimum route planning, where the data is updated periodically tomore » reflect any changes in road or weather conditions. The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide 2-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphical user interfaces that includes human-factors studies to support safety and operational research. Realistic modeling of variations of the posted driving speed are based on human factor studies that take into consideration weather, road conditions, driver`s personality and behavior and vehicle type. The simulator has been developed on a distributed system of networked UNIX computers, but is designed to run on ANL`s IBM SP-X parallel computer system for large scale problems. A novel feature of the developed simulator is that vehicles will be represented by autonomous computer processes, each with a behavior model which performs independent route selection and reacts to external traffic events much like real vehicles. Vehicle processes interact with each other and with ITS components by exchanging messages. With this approach, one will be able to take advantage of emerging massively parallel processor (MPP) systems.« less
NASA Astrophysics Data System (ADS)
de Laat, Cees; Develder, Chris; Jukan, Admela; Mambretti, Joe
This topic is devoted to communication issues in scalable compute and storage systems, such as parallel computers, networks of workstations, and clusters. All aspects of communication in modern systems were solicited, including advances in the design, implementation, and evaluation of interconnection networks, network interfaces, system and storage area networks, on-chip interconnects, communication protocols, routing and communication algorithms, and communication aspects of parallel and distributed algorithms. In total 15 papers were submitted to this topic of which we selected the 7 strongest papers. We grouped the papers in two sessions of 3 papers each and one paper was selected for the best paper session. We noted a number of papers dealing with changing topologies, stability and forwarding convergence in source routing based cluster interconnect network architectures. We grouped these for the first session. The authors of the paper titled: “Implementing a Change Assimilation Mechanism for Source Routing Interconnects” propose a mechanism that can obtain the new topology, and compute and distribute a new set of fabric paths to the source routed network end points to minimize the impact on the forwarding service. The article entitled “Dependability Analysis of a Fault-tolerant Network Reconfiguration Strateg” reports on a case study analyzing the effects of network size, mean time to node failure, mean time to node repair, mean time to network repair and coverage of the failure when using a 2D mesh network with a fault-tolerant mechanism (similar to the one used in the BlueGene/L system), that is able to remove rows and/or columns in the presence of failures. The last paper in this session: “RecTOR: A New and Efficient Method for Dynamic Network Reconfiguration” presents a new dynamic reconfiguration method, that ensures deadlock-freedom during the reconfiguration without causing performance degradation such as increased latency or decreased throughput. The second session groups 3 papers presenting methods, protocols and architectures that enhance capacities in the Networks. The paper titled: “NIC-assisted Cache-Efficient Receive Stack for Message Passing over Ethernet” presents the addition of multiqueue support in the Open-MX receive stack so that all incoming packets for the same process are treated on the same core. It then introduces the idea of binding the target end process near its dedicated receive queue. In general this multiqueue receive stack performs better than the original single queue stack, especially on large communication patterns where multiple processes are involved and manual binding is difficult. The authors of: “A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks” focus on the problem of fault tolerance for high-speed interconnection networks by designing a fault tolerant routing method. The goal was to solve a certain number of link and node failures, considering its impact, and occurrence probability. Their experiments show that their method allows applications to successfully finalize their execution in the presence of several faults, with an average performance value of 97% with respect to the fault-free scenarios. The paper: “Hardware implementation study of the Self-Clocked Fair Queuing Credit Aware (SCFQ-CA) and Deficit Round Robin Credit Aware (DRR-CA) scheduling algorithms” proposes specific implementations of the two schedulers taking into account the characteristics of current high-performance networks. A comparison is presented on the complexity of these two algorithms in terms of silicon area and computation delay. Finally we selected one paper for the special paper session: “A Case Study of Communication Optimizations on 3D Mesh Interconnects”. In this paper the authors present topology aware mapping as a technique to optimize communication on 3-dimensional mesh interconnects and hence improve performance. Results are presented for OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.
Application of high-performance computing to numerical simulation of human movement
NASA Technical Reports Server (NTRS)
Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
1995-01-01
We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
Regular Topologies for Gigabit Wide-Area Networks. Volume 1
NASA Technical Reports Server (NTRS)
Shacham, Nachum; Denny, Barbara A.; Lee, Diane S.; Khan, Irfan H.; Lee, Danny Y. C.; McKenney, Paul
1994-01-01
In general terms, this project aimed at the analysis and design of techniques for very high-speed networking. The formal objectives of the project were to: (1) Identify switch and network technologies for wide-area networks that interconnect a large number of users and can provide individual data paths at gigabit/s rates; (2) Quantitatively evaluate and compare existing and proposed architectures and protocols, identify their strength and growth potentials, and ascertain the compatibility of competing technologies; and (3) Propose new approaches to existing architectures and protocols, and identify opportunities for research to overcome deficiencies and enhance performance. The project was organized into two parts: 1. The design, analysis, and specification of techniques and protocols for very-high-speed network environments. In this part, SRI has focused on several key high-speed networking areas, including Forward Error Control (FEC) for high-speed networks in which data distortion is the result of packet loss, and the distribution of broadband, real-time traffic in multiple user sessions. 2. Congestion Avoidance Testbed Experiment (CATE). This part of the project was done within the framework of the DARTnet experimental T1 national network. The aim of the work was to advance the state of the art in benchmarking DARTnet's performance and traffic control by developing support tools for network experimentation, by designing benchmarks that allow various algorithms to be meaningfully compared, and by investigating new queueing techniques that better satisfy the needs of best-effort and reserved-resource traffic. This document is the final technical report describing the results obtained by SRI under this project. The report consists of three volumes: Volume 1 contains a technical description of the network techniques developed by SRI in the areas of FEC and multicast of real-time traffic. Volume 2 describes the work performed under CATE. Volume 3 contains the source code of all software developed under CATE.
Optical interconnection networks for high-performance computing systems
NASA Astrophysics Data System (ADS)
Biberman, Aleksandr; Bergman, Keren
2012-04-01
Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers.
NASA Astrophysics Data System (ADS)
Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas
2016-03-01
Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matsugaki, Naohiro; Yamada, Yusuke; Igarashi, Noriyuki
2007-01-19
A private network, physically separated from the facility network, was designed and constructed which covered all the four protein crystallography beamlines at the Photon Factory (PF) and Structural Biology Research Center (SBRC). Connecting all the beamlines in the same network allows for simple authentication and a common working environment for a user who uses multiple beamlines. Giga-bit Ethernet wire-speed was achieved for the communication among the beamlines and SBRC buildings.
Estimating workforce development needs for high-speed rail in California.
DOT National Transportation Integrated Search
2012-03-01
This study provides an assessment of the job creation and attendant education and training needs associated with the creation of the California High-Speed Rail (CHSR) network, scheduled to begin construction in September 2012. Given the high profile ...
High-resolution, high-throughput imaging with a multibeam scanning electron microscope.
Eberle, A L; Mikula, S; Schalek, R; Lichtman, J; Knothe Tate, M L; Zeidler, D
2015-08-01
Electron-electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes. We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers. © 2015 The Authors Journal of Microscopy © 2015 Royal Microscopical Society.
Acceleration of low-energy ions at parallel shocks with a focused transport model
Zuo, Pingbing; Zhang, Ming; Rassoul, Hamid K.
2013-04-10
Here, we present a test particle simulation on the injection and acceleration of low-energy suprathermal particles by parallel shocks with a focused transport model. The focused transport equation contains all necessary physics of shock acceleration, but avoids the limitation of diffusive shock acceleration (DSA) that requires a small pitch angle anisotropy. This simulation verifies that the particles with speeds of a fraction of to a few times the shock speed can indeed be directly injected and accelerated into the DSA regime by parallel shocks. At higher energies starting from a few times the shock speed, the energy spectrum of acceleratedmore » particles is a power law with the same spectral index as the solution of standard DSA theory, although the particles are highly anisotropic in the upstream region. The intensity, however, is different from that predicted by DSA theory, indicating a different level of injection efficiency. It is found that the shock strength, the injection speed, and the intensity of an electric cross-shock potential (CSP) jump can affect the injection efficiency of the low-energy particles. A stronger shock has a higher injection efficiency. In addition, if the speed of injected particles is above a few times the shock speed, the produced power-law spectrum is consistent with the prediction of standard DSA theory in both its intensity and spectrum index with an injection efficiency of 1. CSP can increase the injection efficiency through direct particle reflection back upstream, but it has little effect on the energetic particle acceleration once the speed of injected particles is beyond a few times the shock speed. This test particle simulation proves that the focused transport theory is an extension of DSA theory with the capability of predicting the efficiency of particle injection.« less
High Speed All-Optical Data Distribution Network
NASA Astrophysics Data System (ADS)
Braun, Steve; Hodara, Henri
2017-11-01
This article describes the performance and capabilities of an all-optical network featuring low latency, high speed file transfer between serially connected optical nodes. A basic component of the network is a network interface card (NIC) implemented through a unique planar lightwave circuit (PLC) that performs add/drop data and optical signal amplification. The network uses a linear bus topology with nodes in a "T" configuration, as described in the text. The signal is sent optically (hence, no latency) to all nodes via wavelength division multiplexing (WDM), with each node receiver tuned to wavelength of choice via an optical de-multiplexer. Each "T" node routes a portion of the signal to/from the bus through optical couplers, embedded in the network interface card (NIC), to each of the 1 through n computers.
Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro
2010-04-21
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications.
NASA Astrophysics Data System (ADS)
-Emilian Toader, Victorin; Moldovan, Iren-Adelina; Constantin, Ionescu
2014-05-01
The Romanian seismicity recorded in 2013 three important events: the largest seismic "silence", the shortest sequence of two earthquakes greater than 4.8R in less than 14 days after the "Romanian National Institute for Earth Physics" (NIEP) developed a digital network, and a very high crustal activity in Galati area. We analyze the variations of the telluric currents and local magnetic field, variations of the atmospheric electrostatic field, infrasound, temperature, humidity, wind speed and direction, atmospheric pressure, variations in the earth crust with inclinometers and animal behavior. The general effect is the first high seismic energy discontinuity that could be a precursor factor. Since 1977 Romania did not register any important earthquake that would generate a sense of fear among the population. In parallel with the seismic network NIEP developed a magneto-telluric, bioseismic, VLF and acoustic network. A large frequency spectrum is covered for mechanical vibration, magnetic and electric field with ground and air sensors. Special software was designed for acquisition, analysis and real time alert using internet direct connection, web page, email and SMS. Many examples show the sensitivity of telluric current, infrasound, acoustic records (from air-ground), and the effect of tectonic stress on the magnetic field or ground deformation. The next update of the multidisciplinary monitoring network will include measurement of ionization, radon emission, sky color, solar radiation and extension of infrasound and VL/LF equipment. NOAA Space Weather satellites transmit solar activity magnetic field data, X ray flux, electron, and proton flux information useful for complex analysis.
NASA Astrophysics Data System (ADS)
Chang, Yin-Jung
With decreasing transistor size, increasing chip speed, and larger numbers of processors in a system, the performance of a module/system is being limited by the off-chip and off-module bandwidth-distance products. Optical links have moved from fiber-based long distance communications to the cabinet level of 1m--100m, and recently to the backplane-level (10cm--1m). Board-level inter-chip parallel optical interconnects have been demonstrated recently by researchers from Intel, IBM, Fujitsu, NTT and a few research groups in universities. However, the board-level signal/clock distribution function using optical interconnects, the lightwave circuits, the system design, a practically convenient integration scheme committed to the implementation of a system prototype have not been explored or carefully investigated. In this dissertation, the development of a board-level 1 x 4 optical-to-electrical signal distribution at 10Gb/s is presented. In contrast to other prototypes demonstrating board-level parallel optical interconnects that have been drawing much attention for the past decade, the optical link design for the high-speed signal broadcasting is even more complicated and the pitch between receivers could be varying as opposed to fixed-pitch design that has been widely-used in the parallel optical interconnects. New challenges for the board-level high-speed signal broadcasting include, but are not limited to, a new optical link design, a lightwave circuit as a distribution network, and a novel integration scheme that can be a complete radical departure from the traditional assembly method. One of the key building blocks in the lightwave circuit is the distribution network in which a 1 x 4 multimode interference (MMI) splitter is employed. MMI devices operating at high data rates are important in board-level optical interconnects and need to be characterized in the application of board-level signal broadcasting. To determine the speed limitations of MMI devices, the ultra-short pulse response of these devices is modeled based on the guided-mode theory incorporated with Fourier transform technique. For example, for 50 fs Gaussian input pulses into a 1 x 16 splitter, the output pulses are severely degraded in coupling efficiency (48%) and completely broken up in time primarily due to inter-modal and intra-modal (waveguide) dispersion. Material dispersion is found to play only a minor role in the pulse response of MMI devices. However, for 1ps input pulses into the same 1 x 16 splitter, the output pulses are only moderately degraded in coupling efficiency (86%) and only slightly degraded in shape. With the understanding of the necessary condition of the distortionless high-speed signal transmission through MMI devices, high-speed data transmission at 40Gb/s per channel with a total bandwidth of 320Gb/s for 8 output ports is demonstrated for the first time on a 1 x 8 photo-definable polymer-based MMI power splitter. The device is designed with multimode input/output waveguides of 10mum in width and 7.6mum in height for a better input coupling efficiency for which the high-speed testing demands. The eye diagrams are all clear and fully open with an extinction ratio of 10.1dB and a jitter of 1.65 ps. The transmission validity is further confirmed by the bit-error-rate testing at the pseudoramdom binary sequence of 27--1. The fabrication process developed lays the cornerstone of the integration scheme and system design for the prototype of hybrid interconnects. An important problem regarding the guided-mode attenuation associated with optical-interconnect-polymer waveguides fabricated on FR-4 printed-circuit boards is also quantified for the first time. On-board optical waveguides are receiving more attention recently from Fujitsu American Laboratory, IBM Watson Research Center, and Packaging Research Center here at Georgia Tech. This branch of research work is part of the effort in investigating, scientifically, the attenuation mechanism and the effects of the buffer layer thickness on board-level in-plane optical interconnects. The rigorous transmission-line network approach is used and the FR-4 substrate is treated as a long-period substrate grating. A quantitative metric for an appropriate matrix truncation is presented. The peaks of attenuation are shown to occur near the Bragg conditions that characterize the leaky-wave stop bands. For a typical 400mum period FR-4 substrate with an 8mum corrugation depth, a buffer layer thickness of about 40mum is found to be needed to make the attenuation negligibly small. An experimental prototype for on-board optical-to-electrical signal broadcasting operating at 10Gb/s per channel over an interconnect distance of 10cm is demonstrated. An improved 1 x 4 multimode interference (MMI) splitter at 1550nm with linearly-tapered output facet is heterogeneously integrated with four p-i-n photodetectors (PDs) on a Silicon (Si) bench. The Si bench itself is hybrid integrated onto an FR-4 printed-circuit board with four receiver channels. A novel fabrication/integration approach demonstrates the simultaneous alignment between the four waveguides and the four PDs during the MMI fabrication process. The entire system is fully functional at 10Gb/s.
47 CFR 51.5 - Terms and definitions.
Code of Federal Regulations, 2014 CFR
2014-10-01
.... The Communications Act of 1934, as amended. Advanced intelligent network. Advanced intelligent network is a telecommunications network architecture in which call processing, call routing, and network... carrier's network. Advanced services. The term “advanced services” is defined as high speed, switched...
High-speed digital wireless battlefield network
NASA Astrophysics Data System (ADS)
Dao, Son K.; Zhang, Yongguang; Shek, Eddie C.; van Buer, Darrel
1999-07-01
In the past two years, the Digital Wireless Battlefield Network consortium that consists of HRL Laboratories, Hughes Network Systems, Raytheon, and Stanford University has participated in the DARPA TRP program to leverage the efforts in the development of commercial digital wireless products for use in the 21st century battlefield. The consortium has developed an infrastructure and application testbed to support the digitized battlefield. The consortium has implemented and demonstrated this network system. Each member is currently utilizing many of the technology developed in this program in commercial products and offerings. These new communication hardware/software and the demonstrated networking features will benefit military systems and will be applicable to the commercial communication marketplace for high speed voice/data multimedia distribution services.
Effects of changes in size, speed and distance on the perception of curved 3D trajectories
Zhang, Junjun; Braunstein, Myron L.; Andersen, George J.
2012-01-01
Previous research on the perception of 3D object motion has considered time to collision, time to passage, collision detection and judgments of speed and direction of motion, but has not directly studied the perception of the overall shape of the motion path. We examined the perception of the magnitude of curvature and sign of curvature of the motion path for objects moving at eye level in a horizontal plane parallel to the line of sight. We considered two sources of information for the perception of motion trajectories: changes in angular size and changes in angular speed. Three experiments examined judgments of relative curvature for objects moving at different distances. At the closest distance studied, accuracy was high with size information alone but near chance with speed information alone. At the greatest distance, accuracy with size information alone decreased sharply but accuracy for displays with both size and speed information remained high. We found similar results in two experiments with judgments of sign of curvature. Accuracy was higher for displays with both size and speed information than with size information alone, even when the speed information was based on parallel projections and was not informative about sign of curvature. For both magnitude of curvature and sign of curvature judgments, information indicating that the trajectory was curved increased accuracy, even when this information was not directly relevant to the required judgment. PMID:23007204
Rad-Tolerant, Thermally Stable, High-Speed Fiber-Optic Network for Harsh Environments
NASA Technical Reports Server (NTRS)
Leftwich, Matt; Hull, Tony; Leary, Michael; Leftwich, Marcus
2013-01-01
Future NASA destinations will be challenging to get to, have extreme environmental conditions, and may present difficulty in retrieving a spacecraft or its data. Space Photonics is developing a radiation-tolerant (rad-tolerant), high-speed, multi-channel fiber-optic transceiver, associated reconfigurable intelligent node communications architecture, and supporting hardware for intravehicular and ground-based optical networking applications. Data rates approaching 3.2 Gbps per channel will be achieved.
Villette, Vincent; Levesque, Mathieu; Miled, Amine; Gosselin, Benoit; Topolnik, Lisa
2017-01-01
Chronic electrophysiological recordings of neuronal activity combined with two-photon Ca2+ imaging give access to high resolution and cellular specificity. In addition, awake drug-free experimentation is required for investigating the physiological mechanisms that operate in the brain. Here, we developed a simple head fixation platform, which allows simultaneous chronic imaging and electrophysiological recordings to be obtained from the hippocampus of awake mice. We performed quantitative analyses of spontaneous animal behaviour, the associated network states and the cellular activities in the dorsal hippocampus as well as estimated the brain stability limits to image dendritic processes and individual axonal boutons. Ca2+ imaging recordings revealed a relatively stereotyped hippocampal activity despite a high inter-animal and inter-day variability in the mouse behavior. In addition to quiet state and locomotion behavioural patterns, the platform allowed the reliable detection of walking steps and fine speed variations. The brain motion during locomotion was limited to ~1.8 μm, thus allowing for imaging of small sub-cellular structures to be performed in parallel with recordings of network and behavioural states. This simple device extends the drug-free experimentation in vivo, enabling high-stability optophysiological experiments with single-bouton resolution in the mouse awake brain. PMID:28240275
NASA Technical Reports Server (NTRS)
Datta, Anubhav; Johnson, Wayne R.
2009-01-01
This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
An accurate, fast, and scalable solver for high-frequency wave propagation
NASA Astrophysics Data System (ADS)
Zepeda-Núñez, L.; Taus, M.; Hewett, R.; Demanet, L.
2017-12-01
In many science and engineering applications, solving time-harmonic high-frequency wave propagation problems quickly and accurately is of paramount importance. For example, in geophysics, particularly in oil exploration, such problems can be the forward problem in an iterative process for solving the inverse problem of subsurface inversion. It is important to solve these wave propagation problems accurately in order to efficiently obtain meaningful solutions of the inverse problems: low order forward modeling can hinder convergence. Additionally, due to the volume of data and the iterative nature of most optimization algorithms, the forward problem must be solved many times. Therefore, a fast solver is necessary to make solving the inverse problem feasible. For time-harmonic high-frequency wave propagation, obtaining both speed and accuracy is historically challenging. Recently, there have been many advances in the development of fast solvers for such problems, including methods which have linear complexity with respect to the number of degrees of freedom. While most methods scale optimally only in the context of low-order discretizations and smooth wave speed distributions, the method of polarized traces has been shown to retain optimal scaling for high-order discretizations, such as hybridizable discontinuous Galerkin methods and for highly heterogeneous (and even discontinuous) wave speeds. The resulting fast and accurate solver is consequently highly attractive for geophysical applications. To date, this method relies on a layered domain decomposition together with a preconditioner applied in a sweeping fashion, which has limited straight-forward parallelization. In this work, we introduce a new version of the method of polarized traces which reveals more parallel structure than previous versions while preserving all of its other advantages. We achieve this by further decomposing each layer and applying the preconditioner to these new components separately and in parallel. We demonstrate that this produces an even more effective and parallelizable preconditioner for a single right-hand side. As before, additional speed can be gained by pipelining several right-hand-sides.
NASA Astrophysics Data System (ADS)
Chan, Calvin C. K.; Lam, Cedric F.; Tsang, Danny H. K.
2005-09-01
Call for Papers: Optical Ethernet The Journal of Optical Networking (JON) is soliciting papers for a second feature issue on Optical Ethernet. Ethernet has evolved from a LAN technology connecting desktop computers to a universal broadband network interface. It is not only the vehicle for local data connectivity but also the standard interface for next-generation network equipment such as video servers and IP telephony. High-speed Ethernet has been increasingly assuming the volume of backbone network traffic from SONET/SDH-based circuit applications. It is clear that IP has become the universal network protocol for future converged networks, and Ethernet is becoming the ubiquitous link layer for connectivity. Network operators have been offering Ethernet services for several years. Problems and new requirements in Ethernet service offerings have been captured through previous experience. New study groups and standards bodies have been formed to address these problems. This feature issue aims at reviewing and updating the new developments and R&D efforts of high-speed Ethernet in recent years, especially those related to the field of optical networking. Scope of Submission The scope of the papers includes, but is not limited to, the following: Ethernet PHY development 10-Gbit Ethernet on multimode fiber Native Ethernet transport and Ethernet on legacy networks EPON Ethernet OAM Resilient packet ring (RPR) and Ethernet QoS definition and management on Ethernet Ethernet protection switching Circuit emulation services on Ethernet Transparent LAN service development Carrier VLAN and Ethernet Ethernet MAC frame expansion Ethernet switching High-speed Ethernet applications Economic models of high-speed Ethernet services Ethernet field deployment and standard activities To submit to this special issue, follow the normal procedure for submission to JON, indicating "Optical Ethernet feature" in the "Comments" field of the online submission form. For all other questions relating to this feature issue, please send an e-mail to jon@osa.org, subject line "Optical Ethernet." Additional information can be found on the JON website: http://www.osa-jon.org/submission/
Chang, Yu-Kai; Pesce, Caterina; Chiang, Yi-Te; Kuo, Cheng-Yuh; Fong, Dong-Yang
2015-01-01
The purpose of this study was to investigate the after-effects of an acute bout of moderate intensity aerobic cycling exercise on neuroelectric and behavioral indices of efficiency of three attentional networks: alerting, orienting, and executive (conflict) control. Thirty young, highly fit amateur basketball players performed a multifunctional attentional reaction time task, the attention network test (ANT), with a two-group randomized experimental design after an acute bout of moderate intensity spinning wheel exercise or without antecedent exercise. The ANT combined warning signals prior to targets, spatial cueing of potential target locations and target stimuli surrounded by congruent or incongruent flankers, which were provided to assess three attentional networks. Event-related brain potentials and task performance were measured during the ANT. Exercise resulted in a larger P3 amplitude in the alerting and executive control subtasks across frontal, central and parietal midline sites that was paralleled by an enhanced reaction speed only on trials with incongruent flankers of the executive control network. The P3 latency and response accuracy were not affected by exercise. These findings suggest that after spinning, more resources are allocated to task-relevant stimuli in tasks that rely on the alerting and executive control networks. However, the improvement in performance was observed in only the executively challenging conflict condition, suggesting that whether the brain resources that are rendered available immediately after acute exercise translate into better attention performance depends on the cognitive task complexity. PMID:25914634
Resonant tunnelling diode based high speed optoelectronic transmitters
NASA Astrophysics Data System (ADS)
Wang, Jue; Rodrigues, G. C.; Al-Khalidi, Abdullah; Figueiredo, José M. L.; Wasige, Edward
2017-08-01
Resonant tunneling diode (RTD) integration with photo detector (PD) from epi-layer design shows great potential for combining terahertz (THz) RTD electronic source with high speed optical modulation. With an optimized layer structure, the RTD-PD presented in the paper shows high stationary responsivity of 5 A/W at 1310 nm wavelength. High power microwave/mm-wave RTD-PD optoelectronic oscillators are proposed. The circuitry employs two RTD-PD devices in parallel. The oscillation frequencies range from 20-44 GHz with maximum attainable power about 1 mW at 34/37/44GHz.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers
NASA Astrophysics Data System (ADS)
Ogino, T.
High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
2013-05-01
logic to perform control function computations and are connected to the full authority digital engine control ( FADEC ) via a high-speed data...Digital Engine Control ( FADEC ) via a high speed data communication bus. The short term distributed engine control configu- rations will be core...concen- trator; and high temperature electronics, high speed communication bus between the data concentrator and the control law processor master FADEC
Lyu, Tao; Yao, Suying; Nie, Kaiming; Xu, Jiangtao
2014-11-17
A 12-bit high-speed column-parallel two-step single-slope (SS) analog-to-digital converter (ADC) for CMOS image sensors is proposed. The proposed ADC employs a single ramp voltage and multiple reference voltages, and the conversion is divided into coarse phase and fine phase to improve the conversion rate. An error calibration scheme is proposed to correct errors caused by offsets among the reference voltages. The digital-to-analog converter (DAC) used for the ramp generator is based on the split-capacitor array with an attenuation capacitor. Analysis of the DAC's linearity performance versus capacitor mismatch and parasitic capacitance is presented. A prototype 1024 × 32 Time Delay Integration (TDI) CMOS image sensor with the proposed ADC architecture has been fabricated in a standard 0.18 μm CMOS process. The proposed ADC has average power consumption of 128 μW and a conventional rate 6 times higher than the conventional SS ADC. A high-quality image, captured at the line rate of 15.5 k lines/s, shows that the proposed ADC is suitable for high-speed CMOS image sensors.
Trainable hardware for dynamical computing using error backpropagation through physical media.
Hermans, Michiel; Burm, Michaël; Van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter
2015-03-24
Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation-a crucial step for tuning such systems towards a specific task-can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.
Trainable hardware for dynamical computing using error backpropagation through physical media
NASA Astrophysics Data System (ADS)
Hermans, Michiel; Burm, Michaël; van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter
2015-03-01
Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation—a crucial step for tuning such systems towards a specific task—can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.
A High-Speed Design of Montgomery Multiplier
NASA Astrophysics Data System (ADS)
Fan, Yibo; Ikenaga, Takeshi; Goto, Satoshi
With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25μm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180MHz and the throughput of 1024-bit RSA encryption is 352kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.
Devices and circuits for nanoelectronic implementation of artificial neural networks
NASA Astrophysics Data System (ADS)
Turel, Ozgur
Biological neural networks perform complicated information processing tasks at speeds better than conventional computers based on conventional algorithms. This has inspired researchers to look into the way these networks function, and propose artificial networks that mimic their behavior. Unfortunately, most artificial neural networks, either software or hardware, do not provide either the speed or the complexity of a human brain. Nanoelectronics, with high density and low power dissipation that it provides, may be used in developing more efficient artificial neural networks. This work consists of two major contributions in this direction. First is the proposal of the CMOL concept, hybrid CMOS-molecular hardware [1-8]. CMOL may circumvent most of the problems in posed by molecular devices, such as low yield, vet provide high active device density, ˜1012/cm 2. The second contribution is CrossNets, artificial neural networks that are based on CMOL. We showed that CrossNets, with their fault tolerance, exceptional speed (˜ 4 to 6 orders of magnitude faster than biological neural networks) can perform any task any artificial neural network can perform. Moreover, there is a hope that if their integration scale is increased to that of human cerebral cortex (˜ 1010 neurons and ˜ 1014 synapses), they may be capable of performing more advanced tasks.
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology
NASA Astrophysics Data System (ADS)
Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.
2009-04-01
Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Azad, Ariful; Ouzounis, Christos A; Kyrpides, Nikos C; Buluç, Aydin
2018-01-01
Abstract Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license. PMID:29315405
Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.; ...
2018-01-05
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azad, Ariful; Pavlopoulos, Georgios A.; Ouzounis, Christos A.
Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein–protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL’s scalability to cluster large datasets still remains a bottleneck due to high running times andmore » memory demands. In this paper, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ~70 million nodes with ~68 billion edges in ~2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. Finally, HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.« less
Heat Transfer in the Turbulent Boundary Layer of a Compressible Gas at High Speeds
NASA Technical Reports Server (NTRS)
Frankl, F.
1942-01-01
The Reynolds law of heat transfer from a wall to a turbulent stream is extended to the case of flow of a compressible gas at high speeds. The analysis is based on the modern theory of the turbulent boundary layer with laminar sublayer. The investigation is carried out for the case of a plate situated in a parallel stream. The results are obtained independently of the velocity distribution in the turbulent boundar layer.
SIMS: addressing the problem of heterogeneity in databases
NASA Astrophysics Data System (ADS)
Arens, Yigal
1997-02-01
The heterogeneity of remotely accessible databases -- with respect to contents, query language, semantics, organization, etc. -- presents serious obstacles to convenient querying. The SIMS (single interface to multiple sources) system addresses this global integration problem. It does so by defining a single language for describing the domain about which information is stored in the databases and using this language as the query language. Each database to which SIMS is to provide access is modeled using this language. The model describes a database's contents, organization, and other relevant features. SIMS uses these models, together with a planning system drawing on techniques from artificial intelligence, to decompose a given user's high-level query into a series of queries against the databases and other data manipulation steps. The retrieval plan is constructed so as to minimize data movement over the network and maximize parallelism to increase execution speed. SIMS can recover from network failures during plan execution by obtaining data from alternate sources, when possible. SIMS has been demonstrated in the domains of medical informatics and logistics, using real databases.
Kukkonen, C A
1995-06-01
High-speed information processing technologies being developed and applied by the Jet Propulsion Laboratory for NASA and Department of Defense mission needs have potential dual-uses in telemedicine and other medical applications. Fiber optic ground networks connected with microwave satellite links allow NASA to communicate with its astronauts in Earth orbit or on the moon, and with its deep space probes billions of miles away. These networks monitor the health of astronauts and or robotic spacecraft. Similar communications technology will also allow patients to communicate with doctors anywhere on Earth. NASA space missions have science as a major objective. Science sensors have become so sophisticated that they can take more data than our scientists can analyze by hand. High performance computers--workstations, supercomputer and massively parallel computers are being used to transform this data into knowledge. This is done using image processing, data visualization and other techniques to present the data--one's and zero's in forms that a human analyst can readily relate to and understand. Medical sensors have also explored in the in data output--witness CT scans, MRI, and ultrasound. This data must be presented in visual form and computers will allow routine combination of many two dimensional MRI images into three dimensional reconstructions of organs that then can be fully examined by physicians. Emerging technologies such as neural networks that are being "trained" to detect craters on planets or incoming missiles amongst decoys can be used to identify microcalcification in mammograms.
Parallel Computer System for 3D Visualization Stereo on GPU
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
NASA Astrophysics Data System (ADS)
Yan, Hui; Wang, K. G.; Jones, Jim E.
2016-06-01
A parallel algorithm for large-scale three-dimensional phase-field simulations of phase coarsening is developed and implemented on high-performance architectures. From the large-scale simulations, a new kinetics in phase coarsening in the region of ultrahigh volume fraction is found. The parallel implementation is capable of harnessing the greater computer power available from high-performance architectures. The parallelized code enables increase in three-dimensional simulation system size up to a 5123 grid cube. Through the parallelized code, practical runtime can be achieved for three-dimensional large-scale simulations, and the statistical significance of the results from these high resolution parallel simulations are greatly improved over those obtainable from serial simulations. A detailed performance analysis on speed-up and scalability is presented, showing good scalability which improves with increasing problem size. In addition, a model for prediction of runtime is developed, which shows a good agreement with actual run time from numerical tests.
NASA Technical Reports Server (NTRS)
Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.
1994-01-01
Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.
DynaSim: A MATLAB Toolbox for Neural Modeling and Simulation
Sherfey, Jason S.; Soplata, Austin E.; Ardid, Salva; Roberts, Erik A.; Stanley, David A.; Pittman-Polletta, Benjamin R.; Kopell, Nancy J.
2018-01-01
DynaSim is an open-source MATLAB/GNU Octave toolbox for rapid prototyping of neural models and batch simulation management. It is designed to speed up and simplify the process of generating, sharing, and exploring network models of neurons with one or more compartments. Models can be specified by equations directly (similar to XPP or the Brian simulator) or by lists of predefined or custom model components. The higher-level specification supports arbitrarily complex population models and networks of interconnected populations. DynaSim also includes a large set of features that simplify exploring model dynamics over parameter spaces, running simulations in parallel using both multicore processors and high-performance computer clusters, and analyzing and plotting large numbers of simulated data sets in parallel. It also includes a graphical user interface (DynaSim GUI) that supports full functionality without requiring user programming. The software has been implemented in MATLAB to enable advanced neural modeling using MATLAB, given its popularity and a growing interest in modeling neural systems. The design of DynaSim incorporates a novel schema for model specification to facilitate future interoperability with other specifications (e.g., NeuroML, SBML), simulators (e.g., NEURON, Brian, NEST), and web-based applications (e.g., Geppetto) outside MATLAB. DynaSim is freely available at http://dynasimtoolbox.org. This tool promises to reduce barriers for investigating dynamics in large neural models, facilitate collaborative modeling, and complement other tools being developed in the neuroinformatics community. PMID:29599715
DynaSim: A MATLAB Toolbox for Neural Modeling and Simulation.
Sherfey, Jason S; Soplata, Austin E; Ardid, Salva; Roberts, Erik A; Stanley, David A; Pittman-Polletta, Benjamin R; Kopell, Nancy J
2018-01-01
DynaSim is an open-source MATLAB/GNU Octave toolbox for rapid prototyping of neural models and batch simulation management. It is designed to speed up and simplify the process of generating, sharing, and exploring network models of neurons with one or more compartments. Models can be specified by equations directly (similar to XPP or the Brian simulator) or by lists of predefined or custom model components. The higher-level specification supports arbitrarily complex population models and networks of interconnected populations. DynaSim also includes a large set of features that simplify exploring model dynamics over parameter spaces, running simulations in parallel using both multicore processors and high-performance computer clusters, and analyzing and plotting large numbers of simulated data sets in parallel. It also includes a graphical user interface (DynaSim GUI) that supports full functionality without requiring user programming. The software has been implemented in MATLAB to enable advanced neural modeling using MATLAB, given its popularity and a growing interest in modeling neural systems. The design of DynaSim incorporates a novel schema for model specification to facilitate future interoperability with other specifications (e.g., NeuroML, SBML), simulators (e.g., NEURON, Brian, NEST), and web-based applications (e.g., Geppetto) outside MATLAB. DynaSim is freely available at http://dynasimtoolbox.org. This tool promises to reduce barriers for investigating dynamics in large neural models, facilitate collaborative modeling, and complement other tools being developed in the neuroinformatics community.
Ma, Xiaolei; Dai, Zhuang; He, Zhengbing; Ma, Jihui; Wang, Yong; Wang, Yunpeng
2017-04-10
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.
Ma, Xiaolei; Dai, Zhuang; He, Zhengbing; Ma, Jihui; Wang, Yong; Wang, Yunpeng
2017-01-01
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks. PMID:28394270
Prediction based Greedy Perimeter Stateless Routing Protocol for Vehicular Self-organizing Network
NASA Astrophysics Data System (ADS)
Wang, Chunlin; Fan, Quanrun; Chen, Xiaolin; Xu, Wanjin
2018-03-01
PGPSR (Prediction based Greedy Perimeter Stateless Routing) is based on and extended the GPSR protocol to adapt to the high speed mobility of the vehicle auto organization network (VANET) and the changes in the network topology. GPSR is used in the VANET network environment, the network loss rate and throughput are not ideal, even cannot work. Aiming at the problems of the GPSR, the proposed PGPSR routing protocol, it redefines the hello and query packet structure, in the structure of the new node speed and direction information, which received the next update before you can take advantage of its speed and direction to predict the position of node and new network topology, select the right the next hop routing and path. Secondly, the update of the outdated node information of the neighbor’s table is deleted in time. The simulation experiment shows the performance of PGPSR is better than that of GPSR.
Radiology: "killer app" for next generation networks?
McNeill, Kevin M
2004-03-01
The core principles of digital radiology were well developed by the end of the 1980 s. During the following decade tremendous improvements in computer technology enabled realization of those principles at an affordable cost. In this decade work can focus on highly distributed radiology in the context of the integrated health care enterprise. Over the same period computer networking has evolved from a relatively obscure field used by a small number of researchers across low-speed serial links to a pervasive technology that affects nearly all facets of society. Development directions in network technology will ultimately provide end-to-end data paths with speeds that match or exceed the speeds of data paths within the local network and even within workstations. This article describes key developments in Next Generation Networks, potential obstacles, and scenarios in which digital radiology can become a "killer app" that helps to drive deployment of new network infrastructure.
Cooperative high-performance storage in the accelerated strategic computing initiative
NASA Technical Reports Server (NTRS)
Gary, Mark; Howard, Barry; Louis, Steve; Minuzzo, Kim; Seager, Mark
1996-01-01
The use and acceptance of new high-performance, parallel computing platforms will be impeded by the absence of an infrastructure capable of supporting orders-of-magnitude improvement in hierarchical storage and high-speed I/O (Input/Output). The distribution of these high-performance platforms and supporting infrastructures across a wide-area network further compounds this problem. We describe an architectural design and phased implementation plan for a distributed, Cooperative Storage Environment (CSE) to achieve the necessary performance, user transparency, site autonomy, communication, and security features needed to support the Accelerated Strategic Computing Initiative (ASCI). ASCI is a Department of Energy (DOE) program attempting to apply terascale platforms and Problem-Solving Environments (PSEs) toward real-world computational modeling and simulation problems. The ASCI mission must be carried out through a unified, multilaboratory effort, and will require highly secure, efficient access to vast amounts of data. The CSE provides a logically simple, geographically distributed, storage infrastructure of semi-autonomous cooperating sites to meet the strategic ASCI PSE goal of highperformance data storage and access at the user desktop.
dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia
DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less
dfnWorks: A discrete fracture network framework for modeling subsurface flow and transport
Hyman, Jeffrey D.; Karra, Satish; Makedonska, Nataliia; ...
2015-11-01
DFNWORKS is a parallelized computational suite to generate three-dimensional discrete fracture networks (DFN) and simulate flow and transport. Developed at Los Alamos National Laboratory over the past five years, it has been used to study flow and transport in fractured media at scales ranging from millimeters to kilometers. The networks are created and meshed using DFNGEN, which combines FRAM (the feature rejection algorithm for meshing) methodology to stochastically generate three-dimensional DFNs with the LaGriT meshing toolbox to create a high-quality computational mesh representation. The representation produces a conforming Delaunay triangulation suitable for high performance computing finite volume solvers in anmore » intrinsically parallel fashion. Flow through the network is simulated in dfnFlow, which utilizes the massively parallel subsurface flow and reactive transport finite volume code PFLOTRAN. A Lagrangian approach to simulating transport through the DFN is adopted within DFNTRANS to determine pathlines and solute transport through the DFN. Example applications of this suite in the areas of nuclear waste repository science, hydraulic fracturing and CO 2 sequestration are also included.« less
Fast Whole-Engine Stirling Analysis
NASA Technical Reports Server (NTRS)
Dyson, Rodger W.; Wilson, Scott D.; Tew, Roy C.; Demko, Rikako
2006-01-01
This presentation discusses the simulation approach to whole-engine for physical consistency, REV regenerator modeling, grid layering for smoothness, and quality, conjugate heat transfer method adjustment, high-speed low cost parallel cluster, and debugging.
GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments
NASA Astrophysics Data System (ADS)
Chen, Zhanlong; Wu, Xin-cai; Wu, Liang
2008-12-01
Computation Grids enable the coordinated sharing of large-scale distributed heterogeneous computing resources that can be used to solve computationally intensive problems in science, engineering, and commerce. Grid spatial applications are made possible by high-speed networks and a new generation of Grid middleware that resides between networks and traditional GIS applications. The integration of the multi-sources and heterogeneous spatial information and the management of the distributed spatial resources and the sharing and cooperative of the spatial data and Grid services are the key problems to resolve in the development of the Grid GIS. The performance of the spatial index mechanism is the key technology of the Grid GIS and spatial database affects the holistic performance of the GIS in Grid Environments. In order to improve the efficiency of parallel processing of a spatial mass data under the distributed parallel computing grid environment, this paper presents a new grid slot hash parallel spatial index GSHR-Tree structure established in the parallel spatial indexing mechanism. Based on the hash table and dynamic spatial slot, this paper has improved the structure of the classical parallel R tree index. The GSHR-Tree index makes full use of the good qualities of R-Tree and hash data structure. This paper has constructed a new parallel spatial index that can meet the needs of parallel grid computing about the magnanimous spatial data in the distributed network. This arithmetic splits space in to multi-slots by multiplying and reverting and maps these slots to sites in distributed and parallel system. Each sites constructs the spatial objects in its spatial slot into an R tree. On the basis of this tree structure, the index data was distributed among multiple nodes in the grid networks by using large node R-tree method. The unbalance during process can be quickly adjusted by means of a dynamical adjusting algorithm. This tree structure has considered the distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the needs of the grid computing, GSHRTree has a flexible structure in order to satisfy new needs in the future. The GSHR-Tree provides the R-tree capabilities for large spatial datasets stored over interconnected servers. The analysis, including the experiments, confirmed the efficiency of our design choices. The scheme should fit the needs of new applications of spatial data, using endlessly larger datasets. Using the system response time of the parallel processing of spatial scope query algorithm as the performance evaluation factor, According to the result of the simulated the experiments, GSHR-Tree is performed to prove the reasonable design and the high performance of the indexing structure that the paper presented.
High-Performance, Multi-Node File Copies and Checksums for Clustered File Systems
NASA Technical Reports Server (NTRS)
Kolano, Paul Z.; Ciotti, Robert B.
2012-01-01
Modern parallel file systems achieve high performance using a variety of techniques, such as striping files across multiple disks to increase aggregate I/O bandwidth and spreading disks across multiple servers to increase aggregate interconnect bandwidth. To achieve peak performance from such systems, it is typically necessary to utilize multiple concurrent readers/writers from multiple systems to overcome various singlesystem limitations, such as number of processors and network bandwidth. The standard cp and md5sum tools of GNU coreutils found on every modern Unix/Linux system, however, utilize a single execution thread on a single CPU core of a single system, and hence cannot take full advantage of the increased performance of clustered file systems. Mcp and msum are drop-in replacements for the standard cp and md5sum programs that utilize multiple types of parallelism and other optimizations to achieve maximum copy and checksum performance on clustered file systems. Multi-threading is used to ensure that nodes are kept as busy as possible. Read/write parallelism allows individual operations of a single copy to be overlapped using asynchronous I/O. Multinode cooperation allows different nodes to take part in the same copy/checksum. Split-file processing allows multiple threads to operate concurrently on the same file. Finally, hash trees allow inherently serial checksums to be performed in parallel. Mcp and msum provide significant performance improvements over standard cp and md5sum using multiple types of parallelism and other optimizations. The total speed-ups from all improvements are significant. Mcp improves cp performance over 27x, msum improves md5sum performance almost 19x, and the combination of mcp and msum improves verified copies via cp and md5sum by almost 22x. These improvements come in the form of drop-in replacements for cp and md5sum, so are easily used and are available for download as open source software at http://mutil.sourceforge.net.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior
NASA Technical Reports Server (NTRS)
Gelenbe, Erol
1988-01-01
An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
NASCOM system development plan: System description, capabilities, and plans, FY 94-2
NASA Technical Reports Server (NTRS)
1994-01-01
The Nascom System Development Plan (NSDP) for FY 94-2 contains 17 sections. It is a management document containing the approved plan for maintaining the Nascom Network System. Topics covered include an overview of Nascom systems and services, major ground communication support systems, low-speed data system, voice system, high-speed data system, Nascom support for NASA networks, Nascom planning for NASA missions, and network upgrade and advanced systems developments and plans.
High speed data transmission coaxial-cable in the space communication system
NASA Astrophysics Data System (ADS)
Su, Haohang; Huang, Jing
2018-01-01
An effective method is proved based on the scattering parameter of high speed 8-core coaxial-cable measured by vector network analyzer, and the semi-physical simulation is made to receive the eye diagram at different data transmission rate. The result can be apply to analysis decay and distortion of the signal through the coaxial-cable at high frequency, and can extensively design for electromagnetic compatibility of high-speed data transmission system.
System, apparatus and methods to implement high-speed network analyzers
Ezick, James; Lethin, Richard; Ros-Giralt, Jordi; Szilagyi, Peter; Wohlford, David E
2015-11-10
Systems, apparatus and methods for the implementation of high-speed network analyzers are provided. A set of high-level specifications is used to define the behavior of the network analyzer emitted by a compiler. An optimized inline workflow to process regular expressions is presented without sacrificing the semantic capabilities of the processing engine. An optimized packet dispatcher implements a subset of the functions implemented by the network analyzer, providing a fast and slow path workflow used to accelerate specific processing units. Such dispatcher facility can also be used as a cache of policies, wherein if a policy is found, then packet manipulations associated with the policy can be quickly performed. An optimized method of generating DFA specifications for network signatures is also presented. The method accepts several optimization criteria, such as min-max allocations or optimal allocations based on the probability of occurrence of each signature input bit.
Estimating workforce development needs for high-speed rail in California : [research brief].
DOT National Transportation Integrated Search
2012-03-01
It is critical to understand the emergent workforce characteristics for the California High-Speed Rail (HSR) network. Knowledge about the size and characteristics of this workforce, including its training and education needs, is required to guide the...
Design of object-oriented distributed simulation classes
NASA Technical Reports Server (NTRS)
Schoeffler, James D. (Principal Investigator)
1995-01-01
Distributed simulation of aircraft engines as part of a computer aided design package is being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for 'Numerical Propulsion Simulation System'. NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT 'Actor' model of a concurrent object and uses 'connectors' to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Design of Object-Oriented Distributed Simulation Classes
NASA Technical Reports Server (NTRS)
Schoeffler, James D.
1995-01-01
Distributed simulation of aircraft engines as part of a computer aided design package being developed by NASA Lewis Research Center for the aircraft industry. The project is called NPSS, an acronym for "Numerical Propulsion Simulation System". NPSS is a flexible object-oriented simulation of aircraft engines requiring high computing speed. It is desirable to run the simulation on a distributed computer system with multiple processors executing portions of the simulation in parallel. The purpose of this research was to investigate object-oriented structures such that individual objects could be distributed. The set of classes used in the simulation must be designed to facilitate parallel computation. Since the portions of the simulation carried out in parallel are not independent of one another, there is the need for communication among the parallel executing processors which in turn implies need for their synchronization. Communication and synchronization can lead to decreased throughput as parallel processors wait for data or synchronization signals from other processors. As a result of this research, the following have been accomplished. The design and implementation of a set of simulation classes which result in a distributed simulation control program have been completed. The design is based upon MIT "Actor" model of a concurrent object and uses "connectors" to structure dynamic connections between simulation components. Connectors may be dynamically created according to the distribution of objects among machines at execution time without any programming changes. Measurements of the basic performance have been carried out with the result that communication overhead of the distributed design is swamped by the computation time of modules unless modules have very short execution times per iteration or time step. An analytical performance model based upon queuing network theory has been designed and implemented. Its application to realistic configurations has not been carried out.
Real-time SHVC software decoding with multi-threaded parallel processing
NASA Astrophysics Data System (ADS)
Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu
2014-09-01
This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
Implementation of a high-speed face recognition system that uses an optical parallel correlator.
Watanabe, Eriko; Kodate, Kashiko
2005-02-10
We implement a fully automatic fast face recognition system by using a 1000 frame/s optical parallel correlator designed and assembled by us. The operational speed for the 1:N (i.e., matching one image against N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 s, including the preprocessing and postprocessing times. The binary real-only matched filter is devised for the sake of face recognition, and the system is optimized by the false-rejection rate (FRR) and the false-acceptance rate (FAR), according to 300 samples selected by the biometrics guideline. From trial 1:N identification experiments with the optical parallel correlator, we acquired low error rates of 2.6% FRR and 1.3% FAR. Facial images of people wearing thin glasses or heavy makeup that rendered identification difficult were identified with this system.
Repeatability of high-speed migration of tremor along the Nankai subduction zone, Japan
NASA Astrophysics Data System (ADS)
Kato, A.; Tsuruoka, H.; Nakagawa, S.; Hirata, N.
2015-12-01
Tectonic tremors have been considered to be a swarm or superimposed pulses of low-frequency earthquakes (LFEs). To systematically analyze the high-speed migration of tremor [e.g., Shelly et al., 2007], we here focus on an intensive cluster hosting many low-frequency earthquakes located at the western part of Shikoku Island. We relocated ~770 hypocenters of LFEs identified by the JMA, which took place from Jan. 2008 to Dec. 2013, applying double differential relocation algorithm [e.g., Waldhauser and Ellsworth, 2000] to arrival times picked by the JMA and those obtained by waveform cross correlation measurements. The epicentral distributions show a clear alignment parallel to the subduction of the Philippine Sea plate, as like a slip-parallel streaking. Then, we applied a matched-filter technique to continuous seismograms recorded near the source region using relocated template LFEs during 6 years (between Jan. 2008 and Dec. 2013). We newly detected about 60 times the number of template events, which is fairly larger than ones obtained by conventional envelope cross correlation method. Interestingly, we identified many repeated sequences of tremor migrations along the slip-parallel streaking (~350 sequences). Front of each or stacked migration of tremors can be modeled by a parabolic envelope, indicating a diffusion process. The diffusivity of parabolic envelope is estimated to be around 105 m2/s, which is categorized as high-speed migration feature (~100 km/hour). Most of the rapid migrations took place during occurrences of short-term slow slip events (SSEs), and seems to be triggered by ocean and solid Earth tides. The most plausible explanation of the high-speed propagation is a diffusion process of stress pulse concentrated within a cluster of strong brittle patches on the ductile shear zone [Ando et al., 2012]. The viscosity of the ductile shear zone within the streaking is at least one order magnitude smaller than that of the slow-speed migration. This discrepancy of viscosity indicates that the streaking has different rheology compared with background main tremor/SSE belt. In addition, the diffusivity did not show any significant change before and after the Tohoku-Oki M9.0 Earthquake, suggesting that the high-speed propagation of tremors seems to be stable against external stress perturbations.
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Novel wavelength diversity technique for high-speed atmospheric turbulence compensation
NASA Astrophysics Data System (ADS)
Arrasmith, William W.; Sullivan, Sean F.
2010-04-01
The defense, intelligence, and homeland security communities are driving a need for software dominant, real-time or near-real time atmospheric turbulence compensated imagery. The development of parallel processing capabilities are finding application in diverse areas including image processing, target tracking, pattern recognition, and image fusion to name a few. A novel approach to the computationally intensive case of software dominant optical and near infrared imaging through atmospheric turbulence is addressed in this paper. Previously, the somewhat conventional wavelength diversity method has been used to compensate for atmospheric turbulence with great success. We apply a new correlation based approach to the wavelength diversity methodology using a parallel processing architecture enabling high speed atmospheric turbulence compensation. Methods for optical imaging through distributed turbulence are discussed, simulation results are presented, and computational and performance assessments are provided.
Richmond, Paul; Buesing, Lars; Giugliano, Michele; Vasilaki, Eleni
2011-01-01
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a “non-democratic” mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons “vote” independently (“democratic”) for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated. PMID:21572529
Precise time and time interval applications to electric power systems
NASA Technical Reports Server (NTRS)
Wilson, Robert E.
1992-01-01
There are many applications of precise time and time interval (frequency) in operating modern electric power systems. Many generators and customer loads are operated in parallel. The reliable transfer of electrical power to the consumer partly depends on measuring power system frequency consistently in many locations. The internal oscillators in the widely dispersed frequency measuring units must be syntonized. Elaborate protection and control systems guard the high voltage equipment from short and open circuits. For the highest reliability of electric service, engineers need to study all control system operations. Precise timekeeping networks aid in the analysis of power system operations by synchronizing the clocks on recording instruments. Utility engineers want to reproduce events that caused loss of service to customers. Precise timekeeping networks can synchronize protective relay test-sets. For dependable electrical service, all generators and large motors must remain close to speed synchronism. The stable response of a power system to perturbations is critical to continuity of electrical service. Research shows that measurement of the power system state vector can aid in the monitoring and control of system stability. If power system operators know that a lightning storm is approaching a critical transmission line or transformer, they can modify operating strategies. Knowledge of the location of a short circuit fault can speed the re-energizing of a transmission line. One fault location technique requires clocks synchronized to one microsecond. Current research seeks to find out if one microsecond timekeeping can aid and improve power system control and operation.
NASA Astrophysics Data System (ADS)
Levene, Michael John
In all attempts to emulate the considerable powers of the brain, one is struck by both its immense size, parallelism, and complexity. While the fields of neural networks, artificial intelligence, and neuromorphic engineering have all attempted oversimplifications on the considerable complexity, all three can benefit from the inherent scalability and parallelism of optics. This thesis looks at specific aspects of three modes in which optics, and particularly volume holography, can play a part in neural computation. First, holography serves as the basis of highly-parallel correlators, which are the foundation of optical neural networks. The huge input capability of optical neural networks make them most useful for image processing and image recognition and tracking. These tasks benefit from the shift invariance of optical correlators. In this thesis, I analyze the capacity of correlators, and then present several techniques for controlling the amount of shift invariance. Of particular interest is the Fresnel correlator, in which the hologram is displaced from the Fourier plane. In this case, the amount of shift invariance is limited not just by the thickness of the hologram, but by the distance of the hologram from the Fourier plane. Second, volume holography can provide the huge storage capacity and high speed, parallel read-out necessary to support large artificial intelligence systems. However, previous methods for storing data in volume holograms have relied on awkward beam-steering or on as-yet non- existent cheap, wide-bandwidth, tunable laser sources. This thesis presents a new technique, shift multiplexing, which is capable of very high densities, but which has the advantage of a very simple implementation. In shift multiplexing, the reference wave consists of a focused spot a few millimeters in front of the hologram. Multiplexing is achieved by simply translating the hologram a few tens of microns or less. This thesis describes the theory for how shift multiplexing works based on an unconventional, but very intuitive, analysis of the optical far-field. A more detailed analysis based on a path-integral interpretation of the Born approximation is also derived. The capacity of shift multiplexing is compared with that of angle and wavelength multiplexing. The last part of this thesis deals with the role of optics in neuromorphic engineering. Up until now, most neuromorphic engineering has involved one or a few VLSI circuits emulating early sensory systems. However, optical interconnects will be required in order to push towards more ambitious goals, such as the simulation of early visual cortex. I describe a preliminary approach to designing such a system, and show how shift multiplexing can be used to simultaneously store and implement the immense interconnections required by such a project.
NASA Technical Reports Server (NTRS)
Tesch, W. A.; Steenken, W. G.
1976-01-01
The results are presented of a one-dimensional dynamic digital blade row compressor model study of a J85-13 engine operating with uniform and with circumferentially distorted inlet flow. Details of the geometry and the derived blade row characteristics used to simulate the clean inlet performance are given. A stability criterion based upon the self developing unsteady internal flows near surge provided an accurate determination of the clean inlet surge line. The basic model was modified to include an arbitrary extent multi-sector parallel compressor configuration for investigating 180 deg 1/rev total pressure, total temperature, and combined total pressure and total temperature distortions. The combined distortions included opposed, coincident, and 90 deg overlapped patterns. The predicted losses in surge pressure ratio matched the measured data trends at all speeds and gave accurate predictions at high corrected speeds where the slope of the speed lines approached the vertical.
Deep Packet/Flow Analysis using GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gong, Qian; Wu, Wenji; DeMar, Phil
Deep packet inspection (DPI) faces severe performance challenges in high-speed networks (40/100 GE) as it requires a large amount of raw computing power and high I/O throughputs. Recently, researchers have tentatively used GPUs to address the above issues and boost the performance of DPI. Typically, DPI applications involve highly complex operations in both per-packet and per-flow data level, often in real-time. The parallel architecture of GPUs fits exceptionally well for per-packet network traffic processing. However, for stateful network protocols such as TCP, their data stream need to be reconstructed in a per-flow level to deliver a consistent content analysis. Sincemore » the flow-centric operations are naturally antiparallel and often require large memory space for buffering out-of-sequence packets, they can be problematic for GPUs, whose memory is normally limited to several gigabytes. In this work, we present a highly efficient GPU-based deep packet/flow analysis framework. The proposed design includes a purely GPU-implemented flow tracking and TCP stream reassembly. Instead of buffering and waiting for TCP packets to become in sequence, our framework process the packets in batch and uses a deterministic finite automaton (DFA) with prefix-/suffix- tree method to detect patterns across out-of-sequence packets that happen to be located in different batches. In conclusion, evaluation shows that our code can reassemble and forward tens of millions of packets per second and conduct a stateful signature-based deep packet inspection at 55 Gbit/s using an NVIDIA K40 GPU.« less
FastGCN: A GPU Accelerated Tool for Fast Gene Co-Expression Networks
Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun
2015-01-01
Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out. PMID:25602758
FastGCN: a GPU accelerated tool for fast gene co-expression networks.
Liang, Meimei; Zhang, Futao; Jin, Gulei; Zhu, Jun
2015-01-01
Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.
NASA Technical Reports Server (NTRS)
Krasteva, Denitza T.
1998-01-01
Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
Parallel computing in experimental mechanics and optical measurement: A review (II)
NASA Astrophysics Data System (ADS)
Wang, Tianyi; Kemao, Qian
2018-05-01
With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis
NASA Technical Reports Server (NTRS)
Lim, Chieng-Fai
1991-01-01
The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.
A High Speed Mobile Courier Data Access System That Processes Database Queries in Real-Time
NASA Astrophysics Data System (ADS)
Gatsheni, Barnabas Ndlovu; Mabizela, Zwelakhe
A secure high-speed query processing mobile courier data access (MCDA) system for a Courier Company has been developed. This system uses the wireless networks in combination with wired networks for updating a live database at the courier centre in real-time by an offsite worker (the Courier). The system is protected by VPN based on IPsec. There is no system that we know of to date that performs the task for the courier as proposed in this paper.
Research and development of a NYNEX switched multi-megabit data service prototype system
NASA Astrophysics Data System (ADS)
Maman, K. H.; Haines, Robert; Chatterjee, Samir
1991-02-01
Switched Multi-megabit Data Service (SMDS) is a proposed high-speed packet-switched service which will support broadband applications such as Local Area Network (LAN) interconnections across a metropolitan area and beyond. This service is designed to take advantage of evolving Metropolitan Area Network (MAN) standards and technology which will provide customers with 45-mbps and 1 . 5-mbps access to high-speed public data communications networks. This paper will briefly discuss SMDS and review its architecture including the Subscriber Network Interface (SNI) and the SMDS Interface Protocol (SIP). It will review the fundamental features of SMDS such as address screening addressing scheme and access classes. Then it will describe the SMDS prototype system developed in-house by NYNEX Science Technology.
Fast packet switching algorithms for dynamic resource control over ATM networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tsang, R.P.; Keattihananant, P.; Chang, T.
1996-12-01
Real-time continuous media traffic, such as digital video and audio, is expected to comprise a large percentage of the network load on future high speed packet switch networks such as ATM. A major feature which distinguishes high speed networks from traditional slower speed networks is the large amount of data the network must process very quickly. For efficient network usage, traffic control mechanisms are essential. Currently, most mechanisms for traffic control (such as flow control) have centered on the support of Available Bit Rate (ABR), i.e., non real-time, traffic. With regard to ATM, for ABR traffic, two major types ofmore » schemes which have been proposed are rate- control and credit-control schemes. Neither of these schemes are directly applicable to Real-time Variable Bit Rate (VBR) traffic such as continuous media traffic. Traffic control for continuous media traffic is an inherently difficult problem due to the time- sensitive nature of the traffic and its unpredictable burstiness. In this study, we present a scheme which controls traffic by dynamically allocating/de- allocating resources among competing VCs based upon their real-time requirements. This scheme incorporates a form of rate- control, real-time burst-level scheduling and link-link flow control. We show analytically potential performance improvements of our rate- control scheme and present a scheme for buffer dimensioning. We also present simulation results of our schemes and discuss the tradeoffs inherent in maintaining high network utilization and statistically guaranteeing many users` Quality of Service.« less
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Owen, Jeffrey E.
1988-01-01
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.
ERIC Educational Resources Information Center
Tennant, Roy
The Internet is a worldwide network of computer networks. In the United States, the National Science Foundation Network (NSFNet) serves as the Internet "backbone" (a very high speed network that connects key regions across the country). The NSFNet will likely evolve into the National Research and Education Network (NREN) as defined in…
Medical applications for high-performance computers in SKIF-GRID network.
Zhuchkov, Alexey; Tverdokhlebov, Nikolay
2009-01-01
The paper presents a set of software services for massive mammography image processing by using high-performance parallel computers of SKIF-family which are linked into a service-oriented grid-network. An experience of a prototype system implementation in two medical institutions is also described.
NASA Astrophysics Data System (ADS)
O'Connor, A. S.; Justice, B.; Harris, A. T.
2013-12-01
Graphics Processing Units (GPUs) are high-performance multiple-core processors capable of very high computational speeds and large data throughput. Modern GPUs are inexpensive and widely available commercially. These are general-purpose parallel processors with support for a variety of programming interfaces, including industry standard languages such as C. GPU implementations of algorithms that are well suited for parallel processing can often achieve speedups of several orders of magnitude over optimized CPU codes. Significant improvements in speeds for imagery orthorectification, atmospheric correction, target detection and image transformations like Independent Components Analsyis (ICA) have been achieved using GPU-based implementations. Additional optimizations, when factored in with GPU processing capabilities, can provide 50x - 100x reduction in the time required to process large imagery. Exelis Visual Information Solutions (VIS) has implemented a CUDA based GPU processing frame work for accelerating ENVI and IDL processes that can best take advantage of parallelization. Testing Exelis VIS has performed shows that orthorectification can take as long as two hours with a WorldView1 35,0000 x 35,000 pixel image. With GPU orthorecification, the same orthorectification process takes three minutes. By speeding up image processing, imagery can successfully be used by first responders, scientists making rapid discoveries with near real time data, and provides an operational component to data centers needing to quickly process and disseminate data.
A superconducting direct-current limiter with a power of up to 8 MVA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fisher, L. M.; Alferov, D. F., E-mail: DFAlferov@niitfa.ru; Akhmetgareev, M. R.
2016-12-15
A resistive switching superconducting fault current limiter (SFCL) for DC networks with a nominal voltage of 3.5 kV and a nominal current of 2 kA was developed, produced, and tested. The SFCL has two main units—an assembly of superconducting modules and a high-speed vacuum circuit breaker. The assembly of superconducting modules consists of nine (3 × 3) parallel–series connected modules. Each module contains four parallel-connected 2G high-temperature superconducting (HTS) tapes. The results of SFCL tests in the short-circuit emulation mode with a maximum current rise rate of 1300 A/ms are presented. The SFCL is capable of limiting the current atmore » a level of 7 kA and break it 8 ms after the current-limiting mode begins. The average temperature of HTS tapes during the current-limiting mode increases to 210 K. After the current is interrupted, the superconductivity recovery time does not exceed 1 s.« less
Nanopatterned organic semiconductors for visible light communications
NASA Astrophysics Data System (ADS)
Yang, Xilu; Dong, Yurong; Zeng, Pan; Yu, Yan; Xie, Yujun; Gong, Junyi; Shi, Meng; Liang, Rongqing; Ou, Qiongrong; Chi, Nan; Zhang, Shuyu
2018-03-01
Visible light communication (VLC) is becoming an important and promising supplement to the existing Wi-Fi network for the coming 5G communications. Organic light-emitting semiconductors present much fast fluorescent decay rates compared to those of conventional colour-converting phosphors, therefore capable of achieving much higher bandwidths. Here we explore how nanopatterned organic semiconductors can further enhance the data rates of VLC links by improving bandwidths and signal-to-noise ratios (SNRs) and by supporting spatial multiplexing. We first demonstrate a colour-converting VLC system based on nanopatterned hyperbolic metamaterials (HMM), the bandwidth of which is enhanced by 50%. With regard to enhancing SNRs, we achieve a tripling of optical gain by integrating a nanopatterned luminescent concentrator to a signal receiver. In addition, we demonstrate highly directional fluorescent VLC antennas based on nanoimprinted polymer films, paving the way to achieving parallel VLC communications via spatialmultiplexing. These results indicate nanopatterned organic semiconductors provide a promising route to high speed VLC links.
DTS: The NOAO Data Transport System
NASA Astrophysics Data System (ADS)
Fitzpatrick, M.; Semple, T.
2014-05-01
The NOAO Data Transport System (DTS) provides high-throughput, reliable, data transfer between telescopes, pipelines and archive centers located in the Northern and Southern hemispheres. It is a distributed application using XML-RPC for command and control, and either parallel-TCP or UDT protocols for bulk data transport. The system is data-agnostic, allowing arbitrary files or directories to be moved using the same infrastructure. Data paths are configurable in the system by connecting nodes as the source or destination of data in a queue. Each leg of a data path may be configured independently based on the network environment between the sites. A queueing model is currently implemented to manage the automatic movement of data, a streaming model is planned to support arbitrarily large transfers (e.g. as in a disk recovery scenario) or to provide a 'pass-thru' interface to minize overheads. A web-based monitor allows anyone to get a graphical overview of the DTS system as it runs, operators will be able to control individual nodes in the system. Through careful tuning of the network paths DTS is able to achieve in excess of 80-percent of the nominal wire speed using only commodity networks, making it ideal for long-haul transport of large volumes of data.
Information Systems at Enterprise. Design of Secure Network of Enterprise
NASA Astrophysics Data System (ADS)
Saigushev, N. Y.; Mikhailova, U. V.; Vedeneeva, O. A.; Tsaran, A. A.
2018-05-01
No enterprise and company can do without designing its own corporate network in today's information society. It accelerates and facilitates the work of employees at any level, but contains a big threat to confidential information of the company. In addition to the data theft attackers, there are plenty of information threats posed by modern malware effects. In this regard, the computational security of corporate networks is an important component of modern information technologies of computer security for any enterprise. This article says about the design of the protected corporate network of the enterprise that provides the computers on the network access to the Internet, as well interoperability with the branch. The access speed to the Internet at a high level is provided through the use of high-speed access channels and load balancing between devices. The security of the designed network is performed through the use of VLAN technology as well as access lists and AAA server.
Parallel solution of closely coupled systems
NASA Technical Reports Server (NTRS)
Utku, S.; Salama, M.
1986-01-01
The odd-even permutation and associated unitary transformations for reordering the matrix coefficient A are employed as means of breaking the strong seriality which is characteristic of closely coupled systems. The nested dissection technique is also reviewed, and the equivalence between reordering A and dissecting its network is established. The effect of transforming A with odd-even permutation on its topology and the topology of its Cholesky factors is discussed. This leads to the construction of directed graphs showing the computational steps required for factoring A, their precedence relationships and their sequential and concurrent assignment to the available processors. Expressions for the speed-up and efficiency of using N processors in parallel relative to the sequential use of a single processor are derived from the directed graph. Similar expressions are also derived when the number of available processors is fewer than required.
Study and design on USB wireless laser communication system
NASA Astrophysics Data System (ADS)
Wang, Aihua; Zheng, Jiansheng; Ai, Yong
2004-04-01
We give the definition of USB wireless laser communication system (WLCS) and the brief introduction to the protocol of USB, the standard of hardware is also given. The paper analyses the hardware and software of USB WLCS. Wireless laser communication part and USB interface circuit part are discussed in detail. We also give the periphery design of the chip AN2131Q, the control circuit to realize the transformation from parallel port to serial bus, and the circuit of laser sending and receiving of laser communication part, which are simply, cheap and workable. And then the four part of software are analyzed as followed. We have consummated the ISR in the firmware frame to develop the periphery device of USB. We have debugged and consummated the 'ezload,' and the GPD of the drivers. Windows application performs functions and schedules the corresponding API functions to let the interface practical and beautiful. The system can realize USB wireless laser communication between computers, which distance is farther than 50 meters, and top speed can be bigger than 8 Mbps. The system is of great practical sense to resolve the issues of high-speed communication among increasing districts without fiber trunk network.
NASA Astrophysics Data System (ADS)
Kim, Stephan D.; Luo, Jiajun; Buchholz, D. Bruce; Chang, R. P. H.; Grayson, M.
2016-09-01
A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
Kim, Stephan D; Luo, Jiajun; Buchholz, D Bruce; Chang, R P H; Grayson, M
2016-09-01
A modular time division multiplexer (MTDM) device is introduced to enable parallel measurement of multiple samples with both fast and slow decay transients spanning from millisecond to month-long time scales. This is achieved by dedicating a single high-speed measurement instrument for rapid data collection at the start of a transient, and by multiplexing a second low-speed measurement instrument for slow data collection of several samples in parallel for the later transients. The MTDM is a high-level design concept that can in principle measure an arbitrary number of samples, and the low cost implementation here allows up to 16 samples to be measured in parallel over several months, reducing the total ensemble measurement duration and equipment usage by as much as an order of magnitude without sacrificing fidelity. The MTDM was successfully demonstrated by simultaneously measuring the photoconductivity of three amorphous indium-gallium-zinc-oxide thin films with 20 ms data resolution for fast transients and an uninterrupted parallel run time of over 20 days. The MTDM has potential applications in many areas of research that manifest response times spanning many orders of magnitude, such as photovoltaics, rechargeable batteries, amorphous semiconductors such as silicon and amorphous indium-gallium-zinc-oxide.
Optical Interconnection Via Computer-Generated Holograms
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang; Zhou, Shaomin
1995-01-01
Method of free-space optical interconnection developed for data-processing applications like parallel optical computing, neural-network computing, and switching in optical communication networks. In method, multiple optical connections between multiple sources of light in one array and multiple photodetectors in another array made via computer-generated holograms in electrically addressed spatial light modulators (ESLMs). Offers potential advantages of massive parallelism, high space-bandwidth product, high time-bandwidth product, low power consumption, low cross talk, and low time skew. Also offers advantage of programmability with flexibility of reconfiguration, including variation of strengths of optical connections in real time.
NASA Astrophysics Data System (ADS)
Mejia, C.; Badran, F.; Bentamy, A.; Crepon, M.; Thiria, S.; Tran, N.
1999-05-01
We have computed two geophysical model functions (one for the vertical and one for the horizontal polarization) for the NASA scatterometer (NSCAT) by using neural networks. These neural network geophysical model functions (NNGMFs) were estimated with NSCAT scatterometer σO measurements collocated with European Centre for Medium-Range Weather Forecasts analyzed wind vectors during the period January 15 to April 15, 1997. We performed a student t test showing that the NNGMFs estimate the NSCAT σO with a confidence level of 95%. Analysis of the results shows that the mean NSCAT signal depends on the incidence angle and the wind speed and presents the classical biharmonic modulation with respect to the wind azimuth. NSCAT σO increases with respect to the wind speed and presents a well-marked change at around 7 m s-1. The upwind-downwind amplitude is higher for the horizontal polarization signal than for vertical polarization, indicating that the use of horizontal polarization can give additional information for wind retrieval. Comparison of the σO computed by the NNGMFs against the NSCAT-measured σO show a quite low rms, except at low wind speeds. We also computed two specific neural networks for estimating the variance associated to these GMFs. The variances are analyzed with respect to geophysical parameters. This led us to compute the geophysical signal-to-noise ratio, i.e., Kp. The Kp values are quite high at low wind speed and decrease at high wind speed. At constant wind speed the highest Kp are at crosswind directions, showing that the crosswind values are the most difficult to estimate. These neural networks can be expressed as analytical functions, and FORTRAN subroutines can be provided.
High-Speed Rail and Local Land Development : Case Studies in London and Las Vegas
DOT National Transportation Integrated Search
2017-11-01
The efficacy of a high-speed rail system depends, in part, upon locating rail stations close to urban centers and integrating them into broader transportation networks and the urban realm. In addition, through economies of agglomeration, successful h...
NASA Astrophysics Data System (ADS)
Jiang, Yuning; Kang, Jinfeng; Wang, Xinan
2017-03-01
Resistive switching memory (RRAM) is considered as one of the most promising devices for parallel computing solutions that may overcome the von Neumann bottleneck of today’s electronic systems. However, the existing RRAM-based parallel computing architectures suffer from practical problems such as device variations and extra computing circuits. In this work, we propose a novel parallel computing architecture for pattern recognition by implementing k-nearest neighbor classification on metal-oxide RRAM crossbar arrays. Metal-oxide RRAM with gradual RESET behaviors is chosen as both the storage and computing components. The proposed architecture is tested by the MNIST database. High speed (~100 ns per example) and high recognition accuracy (97.05%) are obtained. The influence of several non-ideal device properties is also discussed, and it turns out that the proposed architecture shows great tolerance to device variations. This work paves a new way to achieve RRAM-based parallel computing hardware systems with high performance.
High Speed Computing, LANs, and WAMs
NASA Technical Reports Server (NTRS)
Bergman, Larry A.; Monacos, Steve
1994-01-01
Optical fiber networks may one day offer potential capacities exceeding 10 terabits/sec. This paper describes present gigabit network techniques for distributed computing as illustrated by the CASA gigabit testbed, and then explores future all-optic network architectures that offer increased capacity, more optimized level of service for a given application, high fault tolerance, and dynamic reconfigurability.
NASA Astrophysics Data System (ADS)
Dong, Dai; Li, Xiaoning
2015-03-01
High-pressure solenoid valve with high flow rate and high speed is a key component in an underwater driving system. However, traditional single spool pilot operated valve cannot meet the demands of both high flow rate and high speed simultaneously. A new structure for a high pressure solenoid valve is needed to meet the demand of the underwater driving system. A novel parallel-spool pilot operated high-pressure solenoid valve is proposed to overcome the drawback of the current single spool design. Mathematical models of the opening process and flow rate of the valve are established. Opening response time of the valve is subdivided into 4 parts to analyze the properties of the opening response. Corresponding formulas to solve 4 parts of the response time are derived. Key factors that influence the opening response time are analyzed. According to the mathematical model of the valve, a simulation of the opening process is carried out by MATLAB. Parameters are chosen based on theoretical analysis to design the test prototype of the new type of valve. Opening response time of the designed valve is tested by verifying response of the current in the coil and displacement of the main valve spool. The experimental results are in agreement with the simulated results, therefore the validity of the theoretical analysis is verified. Experimental opening response time of the valve is 48.3 ms at working pressure of 10 MPa. The flow capacity test shows that the largest effective area is 126 mm2 and the largest air flow rate is 2320 L/s. According to the result of the load driving test, the valve can meet the demands of the driving system. The proposed valve with parallel spools provides a new method for the design of a high-pressure valve with fast response and large flow rate.
Forecasting the short-term passenger flow on high-speed railway with neural networks.
Xie, Mei-Quan; Li, Xia-Miao; Zhou, Wen-Liang; Fu, Yan-Bing
2014-01-01
Short-term passenger flow forecasting is an important component of transportation systems. The forecasting result can be applied to support transportation system operation and management such as operation planning and revenue management. In this paper, a divide-and-conquer method based on neural network and origin-destination (OD) matrix estimation is developed to forecast the short-term passenger flow in high-speed railway system. There are three steps in the forecasting method. Firstly, the numbers of passengers who arrive at each station or depart from each station are obtained from historical passenger flow data, which are OD matrices in this paper. Secondly, short-term passenger flow forecasting of the numbers of passengers who arrive at each station or depart from each station based on neural network is realized. At last, the OD matrices in short-term time are obtained with an OD matrix estimation method. The experimental results indicate that the proposed divide-and-conquer method performs well in forecasting the short-term passenger flow on high-speed railway.
NASA Astrophysics Data System (ADS)
Alves, J.; Saraiva, A. C. V.; Campos, L. Z. D. S.; Pinto, O., Jr.; Antunes, L.
2014-12-01
This work presents a method for the evaluation of location accuracy of all Lightning Location System (LLS) in operation in southeastern Brazil, using natural cloud-to-ground (CG) lightning flashes. This can be done through a multiple high-speed cameras network (RAMMER network) installed in the Paraiba Valley region - SP - Brazil. The RAMMER network (Automated Multi-camera Network for Monitoring and Study of Lightning) is composed by four high-speed cameras operating at 2,500 frames per second. Three stationary black-and-white (B&W) cameras were situated in the cities of São José dos Campos and Caçapava. A fourth color camera was mobile (installed in a car), but operated in a fixed location during the observation period, within the city of São José dos Campos. The average distance among cameras was 13 kilometers. Each RAMMER sensor position was determined so that the network can observe the same lightning flash from different angles and all recorded videos were GPS (Global Position System) time stamped, allowing comparisons of events between cameras and the LLS. The RAMMER sensor is basically composed by a computer, a Phantom high-speed camera version 9.1 and a GPS unit. The lightning cases analyzed in the present work were observed by at least two cameras, their position was visually triangulated and the results compared with BrasilDAT network, during the summer seasons of 2011/2012 and 2012/2013. The visual triangulation method is presented in details. The calibration procedure showed an accuracy of 9 meters between the accurate GPS position of the object triangulated and the result from the visual triangulation method. Lightning return stroke positions, estimated with the visual triangulation method, were compared with LLS locations. Differences between solutions were not greater than 1.8 km.
NASA Technical Reports Server (NTRS)
Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.
1999-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Simulation framework for intelligent transportation systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ewing, T.; Doss, E.; Hanebutte, U.
1996-10-01
A simulation framework has been developed for a large-scale, comprehensive, scaleable simulation of an Intelligent Transportation System (ITS). The simulator is designed for running on parallel computers and distributed (networked) computer systems, but can run on standalone workstations for smaller simulations. The simulator currently models instrumented smart vehicles with in-vehicle navigation units capable of optimal route planning and Traffic Management Centers (TMC). The TMC has probe vehicle tracking capabilities (display position and attributes of instrumented vehicles), and can provide two-way interaction with traffic to provide advisories and link times. Both the in-vehicle navigation module and the TMC feature detailed graphicalmore » user interfaces to support human-factors studies. Realistic modeling of variations of the posted driving speed are based on human factors studies that take into consideration weather, road conditions, driver personality and behavior, and vehicle type. The prototype has been developed on a distributed system of networked UNIX computers but is designed to run on parallel computers, such as ANL`s IBM SP-2, for large-scale problems. A novel feature of the approach is that vehicles are represented by autonomous computer processes which exchange messages with other processes. The vehicles have a behavior model which governs route selection and driving behavior, and can react to external traffic events much like real vehicles. With this approach, the simulation is scaleable to take advantage of emerging massively parallel processor (MPP) systems.« less
Transport conductivity of graphene at RF and microwave frequencies
NASA Astrophysics Data System (ADS)
Awan, S. A.; Lombardo, A.; Colli, A.; Privitera, G.; Kulmala, T. S.; Kivioja, J. M.; Koshino, M.; Ferrari, A. C.
2016-03-01
We measure graphene coplanar waveguides from direct current (DC) to a frequency f = 13.5 GHz and show that the apparent resistance (in the presence of parasitic impedances) has an {ω }2 dependence (where ω =2π f), but the intrinsic conductivity (without the influence of parasitic impedances) is frequency-independent. Consequently, in our devices the real part of the complex alternating current (AC) conductivity is the same as the DC value and the imaginary part is ˜0. The graphene channel is modeled as a parallel resistive-capacitive network with a frequency dependence identical to that of the Drude conductivity with momentum relaxation time ˜2.1 ps, highlighting the influence of AC electron transport on the electromagnetic properties of graphene. This can lead to optimized design of high-speed analog field-effect transistors, mixers, frequency doublers, low-noise amplifiers and radiation detectors.
Lifelong cortical myelin plasticity and age-related degeneration in the live mammalian brain.
Hill, Robert A; Li, Alice M; Grutzendler, Jaime
2018-05-01
Axonal myelin increases neural processing speed and efficiency. It is unknown whether patterns of myelin distribution are fixed or whether myelinating oligodendrocytes are continually generated in adulthood and maintain the capacity for structural remodeling. Using high-resolution, intravital label-free and fluorescence optical imaging in mouse cortex, we demonstrate lifelong oligodendrocyte generation occurring in parallel with structural plasticity of individual myelin internodes. Continuous internode formation occurred on both partially myelinated and unmyelinated axons, and the total myelin coverage along individual axons progressed up to two years of age. After peak myelination, gradual oligodendrocyte death and myelin degeneration in aging were associated with pronounced internode loss and myelin debris accumulation within microglia. Thus, cortical myelin remodeling is protracted throughout life, potentially playing critical roles in neuronal network homeostasis. The gradual loss of internodes and myelin degeneration in aging could contribute significantly to brain pathogenesis.
Main, M L; Foltz, D; Firstenberg, M S; Bobinsky, E; Bailey, D; Frantz, B; Pleva, D; Baldizzi, M; Meyers, D P; Jones, K; Spence, M C; Freeman, K; Morehead, A; Thomas, J D
2000-08-01
With high-resolution network transmission required for telemedicine, education, and guided-image acquisition, the impact of errors and transmission rates on image quality needs evaluation. We transmitted clinical echocardiograms from 2 National Aeronautics and Space Administration (NASA) research centers with the use of Motion Picture Expert Group-2 (MPEG-2) encoding and asynchronous transmission mode (ATM) network protocol over the NASA Research and Education Network. Data rates and network quality (cell losses [CLR], errors [CER], and delay variability [CVD]) were altered and image quality was judged. At speeds of 3 to 5 megabits per second (Mbps), digital images were superior to those on videotape; at 2 Mbps, images were equivalent. Increasing CLR caused occasional, brief pauses. Extreme CER and CDV increases still yielded high-quality images. Real-time echocardiographic acquisition, guidance, and transmission is feasible with the use of MPEG-2 and ATM with broadcast quality seen above 3 Mbps, even with severe network quality degradation. These techniques can be applied to telemedicine and used for planned echocardiography aboard the International Space Station.
NASA Technical Reports Server (NTRS)
Main, M. L.; Foltz, D.; Firstenberg, M. S.; Bobinsky, E.; Bailey, D.; Frantz, B.; Pleva, D.; Baldizzi, M.; Meyers, D. P.; Jones, K.;
2000-01-01
With high-resolution network transmission required for telemedicine, education, and guided-image acquisition, the impact of errors and transmission rates on image quality needs evaluation. METHODS: We transmitted clinical echocardiograms from 2 National Aeronautics and Space Administration (NASA) research centers with the use of Motion Picture Expert Group-2 (MPEG-2) encoding and asynchronous transmission mode (ATM) network protocol over the NASA Research and Education Network. Data rates and network quality (cell losses [CLR], errors [CER], and delay variability [CVD]) were altered and image quality was judged. RESULTS: At speeds of 3 to 5 megabits per second (Mbps), digital images were superior to those on videotape; at 2 Mbps, images were equivalent. Increasing CLR caused occasional, brief pauses. Extreme CER and CDV increases still yielded high-quality images. CONCLUSIONS: Real-time echocardiographic acquisition, guidance, and transmission is feasible with the use of MPEG-2 and ATM with broadcast quality seen above 3 Mbps, even with severe network quality degradation. These techniques can be applied to telemedicine and used for planned echocardiography aboard the International Space Station.
NASA Astrophysics Data System (ADS)
Hazza, Muataz Hazza F. Al; Adesta, Erry Y. T.; Riza, Muhammad
2013-12-01
High speed milling has many advantages such as higher removal rate and high productivity. However, higher cutting speed increase the flank wear rate and thus reducing the cutting tool life. Therefore estimating and predicting the flank wear length in early stages reduces the risk of unaccepted tooling cost. This research presents a neural network model for predicting and simulating the flank wear in the CNC end milling process. A set of sparse experimental data for finish end milling on AISI H13 at hardness of 48 HRC have been conducted to measure the flank wear length. Then the measured data have been used to train the developed neural network model. Artificial neural network (ANN) was applied to predict the flank wear length. The neural network contains twenty hidden layer with feed forward back propagation hierarchical. The neural network has been designed with MATLAB Neural Network Toolbox. The results show a high correlation between the predicted and the observed flank wear which indicates the validity of the models.
Optical burst switching based satellite backbone network
NASA Astrophysics Data System (ADS)
Li, Tingting; Guo, Hongxiang; Wang, Cen; Wu, Jian
2018-02-01
We propose a novel time slot based optical burst switching (OBS) architecture for GEO/LEO based satellite backbone network. This architecture can provide high speed data transmission rate and high switching capacity . Furthermore, we design the control plane of this optical satellite backbone network. The software defined network (SDN) and network slice (NS) technologies are introduced. Under the properly designed control mechanism, this backbone network is flexible to support various services with diverse transmission requirements. Additionally, the LEO access and handoff management in this network is also discussed.
A more secure parallel keyed hash function based on chaotic neural network
NASA Astrophysics Data System (ADS)
Huang, Zhongquan
2011-08-01
Although various hash functions based on chaos or chaotic neural network were proposed, most of them can not work efficiently in parallel computing environment. Recently, an algorithm for parallel keyed hash function construction based on chaotic neural network was proposed [13]. However, there is a strict limitation in this scheme that its secret keys must be nonce numbers. In other words, if the keys are used more than once in this scheme, there will be some potential security flaw. In this paper, we analyze the cause of vulnerability of the original one in detail, and then propose the corresponding enhancement measures, which can remove the limitation on the secret keys. Theoretical analysis and computer simulation indicate that the modified hash function is more secure and practical than the original one. At the same time, it can keep the parallel merit and satisfy the other performance requirements of hash function, such as good statistical properties, high message and key sensitivity, and strong collision resistance, etc.
Concurrent computation of attribute filters on shared memory parallel machines.
Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold
2008-10-01
Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Random number generators for large-scale parallel Monte Carlo simulations on FPGA
NASA Astrophysics Data System (ADS)
Lin, Y.; Wang, F.; Liu, B.
2018-05-01
Through parallelization, field programmable gate array (FPGA) can achieve unprecedented speeds in large-scale parallel Monte Carlo (LPMC) simulations. FPGA presents both new constraints and new opportunities for the implementations of random number generators (RNGs), which are key elements of any Monte Carlo (MC) simulation system. Using empirical and application based tests, this study evaluates all of the four RNGs used in previous FPGA based MC studies and newly proposed FPGA implementations for two well-known high-quality RNGs that are suitable for LPMC studies on FPGA. One of the newly proposed FPGA implementations: a parallel version of additive lagged Fibonacci generator (Parallel ALFG) is found to be the best among the evaluated RNGs in fulfilling the needs of LPMC simulations on FPGA.
DOT National Transportation Integrated Search
2014-05-01
With increasing demand and rising fuel costs, both travel time and cost of intercity passenger : transportation are becoming increasingly significant. Around the world, high-speed rail (HSR) is seen as a : way to mitigate the risk of volatile petrole...
DOT National Transportation Integrated Search
2008-06-30
The following Independent Verification and Validation (IV&V) report documents and presents the results of a study of the Washington State Ferries Prototype Wireless High Speed Data Network. The purpose of the study was to evaluate and determine if re...
Noise isolation system for high-speed circuits
McNeilly, D.R.
1983-12-29
A noise isolation circuit is provided that consists of a dual function bypass which confines high-speed switching noise to the component or circuit which generates it and isolates the component or circuit from high-frequency noise transients which may be present on the ground and power supply busses. A local circuit ground is provided which is coupled to the system ground by sufficient impedance to force the dissipation of the noise signal in the local circuit or component generating the noise. The dual function bypass network couples high-frequency noise signals generated in the local component or circuit through a capacitor to the local ground while isolating the component or circuit from noise signals which may be present on the power supply busses or system ground. The network is an effective noise isolating system and is applicable to both high-speed analog and digital circuits.
Noise isolation system for high-speed circuits
McNeilly, David R.
1986-01-01
A noise isolation circuit is provided that consists of a dual function bypass which confines high-speed switching noise to the component or circuit which generates it and isolates the component or circuit from high-frequency noise transients which may be present on the ground and power supply busses. A local circuit ground is provided which is coupled to the system ground by sufficient impedance to force the dissipation of the noise signal in the local circuit or component generating the noise. The dual function bypass network couples high-frequency noise signals generated in the local component or circuit through a capacitor to the local ground while isolating the component or circuit from noise signals which may be present on the power supply busses or system ground. The network is an effective noise isolating system and is applicable to both high-speed analog and digital circuits.
Full range line-field parallel swept source imaging utilizing digital refocusing
NASA Astrophysics Data System (ADS)
Fechtig, Daniel J.; Kumar, Abhishek; Drexler, Wolfgang; Leitgeb, Rainer A.
2015-12-01
We present geometric optics-based refocusing applied to a novel off-axis line-field parallel swept source imaging (LPSI) system. LPSI is an imaging modality based on line-field swept source optical coherence tomography, which permits 3-D imaging at acquisition speeds of up to 1 MHz. The digital refocusing algorithm applies a defocus-correcting phase term to the Fourier representation of complex-valued interferometric image data, which is based on the geometrical optics information of the LPSI system. We introduce the off-axis LPSI system configuration, the digital refocusing algorithm and demonstrate the effectiveness of our method for refocusing volumetric images of technical and biological samples. An increase of effective in-focus depth range from 255 μm to 4.7 mm is achieved. The recovery of the full in-focus depth range might be especially valuable for future high-speed and high-resolution diagnostic applications of LPSI in ophthalmology.
Integrated test system of infrared and laser data based on USB 3.0
NASA Astrophysics Data System (ADS)
Fu, Hui Quan; Tang, Lin Bo; Zhang, Chao; Zhao, Bao Jun; Li, Mao Wen
2017-07-01
Based on USB3.0, this paper presents the design method of an integrated test system for both infrared image data and laser signal data processing module. The core of the design is FPGA logic control, the design uses dual-chip DDR3 SDRAM to achieve high-speed laser data cache, and receive parallel LVDS image data through serial-to-parallel conversion chip, and it achieves high-speed data communication between the system and host computer through the USB3.0 bus. The experimental results show that the developed PC software realizes the real-time display of 14-bit LVDS original image after 14-to-8 bit conversion and JPEG2000 compressed image after decompression in software, and can realize the real-time display of the acquired laser signal data. The correctness of the test system design is verified, indicating that the interface link is normal.
Performance of a 300 Mbps 1:16 serial/parallel optoelectronic receiver module
NASA Technical Reports Server (NTRS)
Richard, M. A.; Claspy, P. C.; Bhasin, K. B.; Bendett, M. B.
1990-01-01
Optical interconnects are being considered for the high speed distribution of multiplexed control signals in GaAs monolithic microwave integrated circuit (MMIC) based phased array antennas. The performance of a hybrid GaAs optoelectronic integrated circuit (OEIC) is described, as well as its design and fabrication. The OEIC converts a 16-bit serial optical input to a 16 parallel line electrical output using an on-board 1:16 demultiplexer and operates at data rates as high as 30b Mbps. The performance characteristics and potential applications of the device are presented.
Megavolt parallel potentials arising from double-layer streams in the Earth's outer radiation belt.
Mozer, F S; Bale, S D; Bonnell, J W; Chaston, C C; Roth, I; Wygant, J
2013-12-06
Huge numbers of double layers carrying electric fields parallel to the local magnetic field line have been observed on the Van Allen probes in connection with in situ relativistic electron acceleration in the Earth's outer radiation belt. For one case with adequate high time resolution data, 7000 double layers were observed in an interval of 1 min to produce a 230,000 V net parallel potential drop crossing the spacecraft. Lower resolution data show that this event lasted for 6 min and that more than 1,000,000 volts of net parallel potential crossed the spacecraft during this time. A double layer traverses the length of a magnetic field line in about 15 s and the orbital motion of the spacecraft perpendicular to the magnetic field was about 700 km during this 6 min interval. Thus, the instantaneous parallel potential along a single magnetic field line was the order of tens of kilovolts. Electrons on the field line might experience many such potential steps in their lifetimes to accelerate them to energies where they serve as the seed population for relativistic acceleration by coherent, large amplitude whistler mode waves. Because the double-layer speed of 3100 km/s is the order of the electron acoustic speed (and not the ion acoustic speed) of a 25 eV plasma, the double layers may result from a new electron acoustic mode. Acceleration mechanisms involving double layers may also be important in planetary radiation belts such as Jupiter, Saturn, Uranus, and Neptune, in the solar corona during flares, and in astrophysical objects.
2016-11-09
software, and their networking to augment optical diagnostics employed in supersonic reacting and non-reacting flow experiments . A high-speed...facility at Caltech. Experiments to date have made use of this equipment, extending previous capabilities to high-speed schlieren quantitative flow...visualization and image correlation velocimetry, with further experiments currently in progress. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17
High-performance mass storage system for workstations
NASA Technical Reports Server (NTRS)
Chiang, T.; Tang, Y.; Gupta, L.; Cooperman, S.
1993-01-01
Reduced Instruction Set Computer (RISC) workstations and Personnel Computers (PC) are very popular tools for office automation, command and control, scientific analysis, database management, and many other applications. However, when using Input/Output (I/O) intensive applications, the RISC workstations and PC's are often overburdened with the tasks of collecting, staging, storing, and distributing data. Also, by using standard high-performance peripherals and storage devices, the I/O function can still be a common bottleneck process. Therefore, the high-performance mass storage system, developed by Loral AeroSys' Independent Research and Development (IR&D) engineers, can offload a RISC workstation of I/O related functions and provide high-performance I/O functions and external interfaces. The high-performance mass storage system has the capabilities to ingest high-speed real-time data, perform signal or image processing, and stage, archive, and distribute the data. This mass storage system uses a hierarchical storage structure, thus reducing the total data storage cost, while maintaining high-I/O performance. The high-performance mass storage system is a network of low-cost parallel processors and storage devices. The nodes in the network have special I/O functions such as: SCSI controller, Ethernet controller, gateway controller, RS232 controller, IEEE488 controller, and digital/analog converter. The nodes are interconnected through high-speed direct memory access links to form a network. The topology of the network is easily reconfigurable to maximize system throughput for various applications. This high-performance mass storage system takes advantage of a 'busless' architecture for maximum expandability. The mass storage system consists of magnetic disks, a WORM optical disk jukebox, and an 8mm helical scan tape to form a hierarchical storage structure. Commonly used files are kept in the magnetic disk for fast retrieval. The optical disks are used as archive media, and the tapes are used as backup media. The storage system is managed by the IEEE mass storage reference model-based UniTree software package. UniTree software will keep track of all files in the system, will automatically migrate the lesser used files to archive media, and will stage the files when needed by the system. The user can access the files without knowledge of their physical location. The high-performance mass storage system developed by Loral AeroSys will significantly boost the system I/O performance and reduce the overall data storage cost. This storage system provides a highly flexible and cost-effective architecture for a variety of applications (e.g., realtime data acquisition with a signal and image processing requirement, long-term data archiving and distribution, and image analysis and enhancement).
Gated integrator with signal baseline subtraction
Wang, X.
1996-12-17
An ultrafast, high precision gated integrator includes an opamp having differential inputs. A signal to be integrated is applied to one of the differential inputs through a first input network, and a signal indicative of the DC offset component of the signal to be integrated is applied to the other of the differential inputs through a second input network. A pair of electronic switches in the first and second input networks define an integrating period when they are closed. The first and second input networks are substantially symmetrically constructed of matched components so that error components introduced by the electronic switches appear symmetrically in both input circuits and, hence, are nullified by the common mode rejection of the integrating opamp. The signal indicative of the DC offset component is provided by a sample and hold circuit actuated as the integrating period begins. The symmetrical configuration of the integrating circuit improves accuracy and speed by balancing out common mode errors, by permitting the use of high speed switching elements and high speed opamps and by permitting the use of a small integrating time constant. The sample and hold circuit substantially eliminates the error caused by the input signal baseline offset during a single integrating window. 5 figs.
Gated integrator with signal baseline subtraction
Wang, Xucheng
1996-01-01
An ultrafast, high precision gated integrator includes an opamp having differential inputs. A signal to be integrated is applied to one of the differential inputs through a first input network, and a signal indicative of the DC offset component of the signal to be integrated is applied to the other of the differential inputs through a second input network. A pair of electronic switches in the first and second input networks define an integrating period when they are closed. The first and second input networks are substantially symmetrically constructed of matched components so that error components introduced by the electronic switches appear symmetrically in both input circuits and, hence, are nullified by the common mode rejection of the integrating opamp. The signal indicative of the DC offset component is provided by a sample and hold circuit actuated as the integrating period begins. The symmetrical configuration of the integrating circuit improves accuracy and speed by balancing out common mode errors, by permitting the use of high speed switching elements and high speed opamps and by permitting the use of a small integrating time constant. The sample and hold circuit substantially eliminates the error caused by the input signal baseline offset during a single integrating window.
Construction of a parallel processor for simulating manipulators and other mechanical systems
NASA Technical Reports Server (NTRS)
Hannauer, George
1991-01-01
This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆
Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-01-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mercier, C.W.
The Network File System (NFS) will be the user interface to a High-Performance Data System (HPDS) being developed at Los Alamos National Laboratory (LANL). HPDS will manage high-capacity, high-performance storage systems connected directly to a high-speed network from distributed workstations. NFS will be modified to maximize performance and to manage massive amounts of data. 6 refs., 3 figs.
Network support for system initiated checkpoints
Chen, Dong; Heidelberger, Philip
2013-01-29
A system, method and computer program product for supporting system initiated checkpoints in parallel computing systems. The system and method generates selective control signals to perform checkpointing of system related data in presence of messaging activity associated with a user application running at the node. The checkpointing is initiated by the system such that checkpoint data of a plurality of network nodes may be obtained even in the presence of user applications running on highly parallel computers that include ongoing user messaging activity.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach
NASA Technical Reports Server (NTRS)
Mak, Victor W. K.
1986-01-01
Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan
1994-01-01
A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.
A parallel implementation of a multisensor feature-based range-estimation method
NASA Technical Reports Server (NTRS)
Suorsa, Raymond E.; Sridhar, Banavar
1993-01-01
There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer.
Large-Scale NASA Science Applications on the Columbia Supercluster
NASA Technical Reports Server (NTRS)
Brooks, Walter
2005-01-01
Columbia, NASA's newest 61 teraflops supercomputer that became operational late last year, is a highly integrated Altix cluster of 10,240 processors, and was named to honor the crew of the Space Shuttle lost in early 2003. Constructed in just four months, Columbia increased NASA's computing capability ten-fold, and revitalized the Agency's high-end computing efforts. Significant cutting-edge science and engineering simulations in the areas of space and Earth sciences, as well as aeronautics and space operations, are already occurring on this largest operational Linux supercomputer, demonstrating its capacity and capability to accelerate NASA's space exploration vision. The presentation will describe how an integrated environment consisting not only of next-generation systems, but also modeling and simulation, high-speed networking, parallel performance optimization, and advanced data analysis and visualization, is being used to reduce design cycle time, accelerate scientific discovery, conduct parametric analysis of multiple scenarios, and enhance safety during the life cycle of NASA missions. The talk will conclude by discussing how NAS partnered with various NASA centers, other government agencies, computer industry, and academia, to create a national resource in large-scale modeling and simulation.
Queueing models for token and slotted ring networks. Thesis
NASA Technical Reports Server (NTRS)
Peden, Jeffery H.
1990-01-01
Currently the end-to-end delay characteristics of very high speed local area networks are not well understood. The transmission speed of computer networks is increasing, and local area networks especially are finding increasing use in real time systems. Ring networks operation is generally well understood for both token rings and slotted rings. There is, however, a severe lack of queueing models for high layer operation. There are several factors which contribute to the processing delay of a packet, as opposed to the transmission delay, e.g., packet priority, its length, the user load, the processor load, the use of priority preemption, the use of preemption at packet reception, the number of processors, the number of protocol processing layers, the speed of each processor, and queue length limitations. Currently existing medium access queueing models are extended by adding modeling techniques which will handle exhaustive limited service both with and without priority traffic, and modeling capabilities are extended into the upper layers of the OSI model. Some of the model are parameterized solution methods, since it is shown that certain models do not exist as parameterized solutions, but rather as solution methods.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites
NASA Technical Reports Server (NTRS)
Sues, R. H.; Lua, Y. J.; Smith, M. D.
1994-01-01
The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure
Kim, Youngjae; Atchley, Scott; Vallee, Geoffroy R.; ...
2016-04-05
While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this study, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulkmore » data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, and data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. Finally, with our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.« less
Optimizing End-to-End Big Data Transfers over Terabits Network Infrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Youngjae; Atchley, Scott; Vallee, Geoffroy R.
While future terabit networks hold the promise of significantly improving big-data motion among geographically distributed data centers, significant challenges must be overcome even on today's 100 gigabit networks to realize end-to-end performance. Multiple bottlenecks exist along the end-to-end path from source to sink, for instance, the data storage infrastructure at both the source and sink and its interplay with the wide-area network are increasingly the bottleneck to achieving high performance. In this study, we identify the issues that lead to congestion on the path of an end-to-end data transfer in the terabit network environment, and we present a new bulkmore » data movement framework for terabit networks, called LADS. LADS exploits the underlying storage layout at each endpoint to maximize throughput without negatively impacting the performance of shared storage resources for other users. LADS also uses the Common Communication Interface (CCI) in lieu of the sockets interface to benefit from hardware-level zero-copy, and operating system bypass capabilities when available. It can further improve data transfer performance under congestion on the end systems using buffering at the source using flash storage. With our evaluations, we show that LADS can avoid congested storage elements within the shared storage resource, improving input/output bandwidth, and data transfer rates across the high speed networks. We also investigate the performance degradation problems of LADS due to I/O contention on the parallel file system (PFS), when multiple LADS tools share the PFS. We design and evaluate a meta-scheduler to coordinate multiple I/O streams while sharing the PFS, to minimize the I/O contention on the PFS. Finally, with our evaluations, we observe that LADS with meta-scheduling can further improve the performance by up to 14 percent relative to LADS without meta-scheduling.« less
Parallel optoelectronic trinary signed-digit division
NASA Astrophysics Data System (ADS)
Alam, Mohammad S.
1999-03-01
The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
2007-09-26
From left: Data Parallel Line Relaxation (DPLR) software team members Kerry Trumble, Deepak Bose and David Hash analyze and predict the extreme environments NASA's space shuttle experiences during its super high-speed reentry into Earth’s atmosphere.
The crew activity planning system bus interface unit
NASA Technical Reports Server (NTRS)
Allen, M. A.
1979-01-01
The hardware and software designs used to implement a high speed parallel communications interface to the MITRE 307.2 kilobit/second serial bus communications system are described. The primary topic is the development of the bus interface unit.
Learning, memory, and the role of neural network architecture.
Hermundstad, Ann M; Brown, Kevin S; Bassett, Danielle S; Carlson, Jean M
2011-06-01
The performance of information processing systems, from artificial neural networks to natural neuronal ensembles, depends heavily on the underlying system architecture. In this study, we compare the performance of parallel and layered network architectures during sequential tasks that require both acquisition and retention of information, thereby identifying tradeoffs between learning and memory processes. During the task of supervised, sequential function approximation, networks produce and adapt representations of external information. Performance is evaluated by statistically analyzing the error in these representations while varying the initial network state, the structure of the external information, and the time given to learn the information. We link performance to complexity in network architecture by characterizing local error landscape curvature. We find that variations in error landscape structure give rise to tradeoffs in performance; these include the ability of the network to maximize accuracy versus minimize inaccuracy and produce specific versus generalizable representations of information. Parallel networks generate smooth error landscapes with deep, narrow minima, enabling them to find highly specific representations given sufficient time. While accurate, however, these representations are difficult to generalize. In contrast, layered networks generate rough error landscapes with a variety of local minima, allowing them to quickly find coarse representations. Although less accurate, these representations are easily adaptable. The presence of measurable performance tradeoffs in both layered and parallel networks has implications for understanding the behavior of a wide variety of natural and artificial learning systems.
ISDN Application in the Army Environment
1992-02-01
Signalling System Number 7 ( SS7 ). SS7 is a packet switched signalling network operating in parallel with the traffic bearing network. The current, in...for example, require SS7 . Further into the future, broadband ISDN (B-ISDN) is expected to provide high-quality, full-motion video, High Definition...smaller business offices, ISDN could be a viable alternative to private networks, especially when switches are connected through SS7 . ISDN, in combination
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bonachea, Dan; Hargrove, P.
GASNet is a language-independent, low-level networking layer that provides network-independent, high-performance communication primitives tailored for implementing parallel global address space SPMD languages and libraries such as UPC, UPC++, Co-Array Fortran, Legion, Chapel, and many others. The interface is primarily intended as a compilation target and for use by runtime library writers (as opposed to end users), and the primary goals are high performance, interface portability, and expressiveness. GASNet stands for "Global-Address Space Networking".
Holographic memory for high-density data storage and high-speed pattern recognition
NASA Astrophysics Data System (ADS)
Gu, Claire
2002-09-01
As computers and the internet become faster and faster, more and more information is transmitted, received, and stored everyday. The demand for high density and fast access time data storage is pushing scientists and engineers to explore all possible approaches including magnetic, mechanical, optical, etc. Optical data storage has already demonstrated its potential in the competition against other storage technologies. CD and DVD are showing their advantages in the computer and entertainment market. What motivated the use of optical waves to store and access information is the same as the motivation for optical communication. Light or an optical wave has an enormous capacity (or bandwidth) to carry information because of its short wavelength and parallel nature. In optical storage, there are two types of mechanism, namely localized and holographic memories. What gives the holographic data storage an advantage over localized bit storage is the natural ability to read the stored information in parallel, therefore, meeting the demand for fast access. Another unique feature that makes the holographic data storage attractive is that it is capable of performing associative recall at an incomparable speed. Therefore, volume holographic memory is particularly suitable for high-density data storage and high-speed pattern recognition. In this paper, we review previous works on volume holographic memories and discuss the challenges for this technology to become a reality.
Sodickson, Daniel K.
2010-01-01
Cardiovascular magnetic resonance imaging (CVMRI) is of proven clinical value in the non-invasive imaging of cardiovascular diseases. CVMRI requires rapid image acquisition, but acquisition speed is fundamentally limited in conventional MRI. Parallel imaging provides a means for increasing acquisition speed and efficiency. However, signal-to-noise (SNR) limitations and the limited number of receiver channels available on most MR systems have in the past imposed practical constraints, which dictated the use of moderate accelerations in CVMRI. High levels of acceleration, which were unattainable previously, have become possible with many-receiver MR systems and many-element, cardiac-optimized RF-coil arrays. The resulting imaging speed improvements can be exploited in a number of ways, ranging from enhancement of spatial and temporal resolution to efficient whole heart coverage to streamlining of CVMRI work flow. In this review, examples of these strategies are provided, following an outline of the fundamentals of the highly accelerated imaging approaches employed in CVMRI. Topics discussed include basic principles of parallel imaging; key requirements for MR systems and RF-coil design; practical considerations of SNR management, supported by multi-dimensional accelerations, 3D noise averaging and high field imaging; highly accelerated clinical state-of-the art cardiovascular imaging applications spanning the range from SNR-rich to SNR-limited; and current trends and future directions. PMID:17562047
NASA Astrophysics Data System (ADS)
Abramov, E. Y.; Sopov, V. I.
2017-10-01
In a given research using the example of traction network area with high asymmetry of power supply parameters, the sequence of comparative assessment of power losses in DC traction network with parallel and traditional separated operating modes of traction substation feeders was shown. Experimental measurements were carried out under these modes of operation. The calculation data results based on statistic processing showed the power losses decrease in contact network and the increase in feeders. The changes proved to be critical ones and this demonstrates the significance of potential effects when converting traction network areas into parallel feeder operation. An analytical method of calculation the average power losses for different feed schemes of the traction network was developed. On its basis, the dependences of the relative losses were obtained by varying the difference in feeder voltages. The calculation results showed unreasonableness transition to a two-sided feed scheme for the considered traction network area. A larger reduction in the total power loss can be obtained with a smaller difference of the feeders’ resistance and / or a more symmetrical sectioning scheme of contact network.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Rao, Hariprasad Nannapaneni
1989-01-01
The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Yokohama, Noriya
2003-09-01
The author constructed a medical image network system using open source software that took security into consideration. This system was enabled for search and browse with a WWW browser, and images were stored in a DICOM server. In order to realize this function, software was developed to fill in the gap between the DICOM protocol and HTTP using PHP language. The transmission speed was evaluated by the difference in protocols between DICOM and HTTP. Furthermore, an attempt was made to evaluate the convenience of medical image access with a personal information terminal via the Internet through the high-speed mobile communication terminal. Results suggested the feasibility of remote diagnosis and application to emergency care.
Data Handling and Communication
NASA Astrophysics Data System (ADS)
Hemmer, FréDéRic Giorgio Innocenti, Pier
The following sections are included: * Introduction * Computing Clusters and Data Storage: The New Factory and Warehouse * Local Area Networks: Organizing Interconnection * High-Speed Worldwide Networking: Accelerating Protocols * Detector Simulation: Events Before the Event * Data Analysis and Programming Environment: Distilling Information * World Wide Web: Global Networking * References
Application of the ANNA neural network chip to high-speed character recognition.
Sackinger, E; Boser, B E; Bromley, J; Lecun, Y; Jackel, L D
1992-01-01
A neural network with 136000 connections for recognition of handwritten digits has been implemented using a mixed analog/digital neural network chip. The neural network chip is capable of processing 1000 characters/s. The recognition system has essentially the same rate (5%) as a simulation of the network with 32-b floating-point precision.
Pushbroom Stereo for High-Speed Navigation in Cluttered Environments
2014-09-01
inertial measurement sensors such as Achtelik et al .’s implemention of PTAM (parallel tracking and mapping) [15] with a barometric altimeter, stable flights...in indoor and outdoor environments are possible [1]. With a full vison- aided inertial navigation system (VINS), Li et al . have shown remarkable...avoidance on small UAVs. Stereo systems suffer from a similar speed issue, with most modern systems running at or below 30 Hz [8], [27]. Honegger et
Men, Zhongxian; Yee, Eugene; Lien, Fue-Sang; Yang, Zhiling; Liu, Yongqian
2014-01-01
Short-term wind speed and wind power forecasts (for a 72 h period) are obtained using a nonlinear autoregressive exogenous artificial neural network (ANN) methodology which incorporates either numerical weather prediction or high-resolution computational fluid dynamics wind field information as an exogenous input. An ensemble approach is used to combine the predictions from many candidate ANNs in order to provide improved forecasts for wind speed and power, along with the associated uncertainties in these forecasts. More specifically, the ensemble ANN is used to quantify the uncertainties arising from the network weight initialization and from the unknown structure of the ANN. All members forming the ensemble of neural networks were trained using an efficient particle swarm optimization algorithm. The results of the proposed methodology are validated using wind speed and wind power data obtained from an operational wind farm located in Northern China. The assessment demonstrates that this methodology for wind speed and power forecasting generally provides an improvement in predictive skills when compared to the practice of using an "optimal" weight vector from a single ANN while providing additional information in the form of prediction uncertainty bounds.
Lien, Fue-Sang; Yang, Zhiling; Liu, Yongqian
2014-01-01
Short-term wind speed and wind power forecasts (for a 72 h period) are obtained using a nonlinear autoregressive exogenous artificial neural network (ANN) methodology which incorporates either numerical weather prediction or high-resolution computational fluid dynamics wind field information as an exogenous input. An ensemble approach is used to combine the predictions from many candidate ANNs in order to provide improved forecasts for wind speed and power, along with the associated uncertainties in these forecasts. More specifically, the ensemble ANN is used to quantify the uncertainties arising from the network weight initialization and from the unknown structure of the ANN. All members forming the ensemble of neural networks were trained using an efficient particle swarm optimization algorithm. The results of the proposed methodology are validated using wind speed and wind power data obtained from an operational wind farm located in Northern China. The assessment demonstrates that this methodology for wind speed and power forecasting generally provides an improvement in predictive skills when compared to the practice of using an “optimal” weight vector from a single ANN while providing additional information in the form of prediction uncertainty bounds. PMID:27382627
Receiver-Assisted Congestion Control to Achieve High Throughput in Lossy Wireless Networks
NASA Astrophysics Data System (ADS)
Shi, Kai; Shu, Yantai; Yang, Oliver; Luo, Jiarong
2010-04-01
Many applications would require fast data transfer in high-speed wireless networks nowadays. However, due to its conservative congestion control algorithm, Transmission Control Protocol (TCP) cannot effectively utilize the network capacity in lossy wireless networks. In this paper, we propose a receiver-assisted congestion control mechanism (RACC) in which the sender performs loss-based control, while the receiver is performing delay-based control. The receiver measures the network bandwidth based on the packet interarrival interval and uses it to compute a congestion window size deemed appropriate for the sender. After receiving the advertised value feedback from the receiver, the sender then uses the additive increase and multiplicative decrease (AIMD) mechanism to compute the correct congestion window size to be used. By integrating the loss-based and the delay-based congestion controls, our mechanism can mitigate the effect of wireless losses, alleviate the timeout effect, and therefore make better use of network bandwidth. Simulation and experiment results in various scenarios show that our mechanism can outperform conventional TCP in high-speed and lossy wireless environments.
NASA Astrophysics Data System (ADS)
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
Yokohama, Noriya; Tsuchimoto, Tadashi; Oishi, Masamichi; Itou, Katsuya
2007-01-20
It has been noted that the downtime of medical informatics systems is often long. Many systems encounter downtimes of hours or even days, which can have a critical effect on daily operations. Such systems remain especially weak in the areas of database and medical imaging data. The scheme design shows the three-layer architecture of the system: application, database, and storage layers. The application layer uses the DICOM protocol (Digital Imaging and Communication in Medicine) and HTTP (Hyper Text Transport Protocol) with AJAX (Asynchronous JavaScript+XML). The database is designed to decentralize in parallel using cluster technology. Consequently, restoration of the database can be done not only with ease but also with improved retrieval speed. In the storage layer, a network RAID (Redundant Array of Independent Disks) system, it is possible to construct exabyte-scale parallel file systems that exploit storage spread. Development and evaluation of the test-bed has been successful in medical information data backup and recovery in a network environment. This paper presents a schematic design of the new medical informatics system that can be accommodated from a recovery and the dynamic Web application for medical imaging distribution using AJAX.
Placati, Silvio; Guermandi, Marco; Samore, Andrea; Scarselli, Eleonora Franchi; Guerrieri, Roberto
2016-09-01
Diffuse optical tomography is an imaging technique, based on evaluation of how light propagates within the human head to obtain the functional information about the brain. Precision in reconstructing such an optical properties map is highly affected by the accuracy of the light propagation model implemented, which needs to take into account the presence of clear and scattering tissues. We present a numerical solver based on the radiosity-diffusion model, integrating the anatomical information provided by a structural MRI. The solver is designed to run on parallel heterogeneous platforms based on multiple GPUs and CPUs. We demonstrate how the solver provides a 7 times speed-up over an isotropic-scattered parallel Monte Carlo engine based on a radiative transport equation for a domain composed of 2 million voxels, along with a significant improvement in accuracy. The speed-up greatly increases for larger domains, allowing us to compute the light distribution of a full human head ( ≈ 3 million voxels) in 116 s for the platform used.
Precision Parameter Estimation and Machine Learning
NASA Astrophysics Data System (ADS)
Wandelt, Benjamin D.
2008-12-01
I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.
Power-Aware Compiler Controllable Chip Multiprocessor
NASA Astrophysics Data System (ADS)
Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori
A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.
GSRP/David Marshall: Fully Automated Cartesian Grid CFD Application for MDO in High Speed Flows
NASA Technical Reports Server (NTRS)
2003-01-01
With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.
The structure of the electron diffusion region during asymmetric anti-parallel magnetic reconnection
NASA Astrophysics Data System (ADS)
Swisdak, M.; Drake, J. F.; Price, L.; Burch, J. L.; Cassak, P.
2017-12-01
The structure of the electron diffusion region during asymmetric magnetic reconnection is ex- plored with high-resolution particle-in-cell simulations that focus on an magnetopause event ob- served by the Magnetospheric Multiscale Mission (MMS). A major surprise is the development of a standing, oblique whistler-like structure with regions of intense positive and negative dissipation. This structure arises from high-speed electrons that flow along the magnetosheath magnetic sepa- ratrices, converge in the dissipation region and jet across the x-line into the magnetosphere. The jet produces a region of negative charge and generates intense parallel electric fields that eject the electrons downstream along the magnetospheric separatrices. The ejected electrons produce the parallel velocity-space crescents documented by MMS.
Proteus: a reconfigurable computational network for computer vision
NASA Astrophysics Data System (ADS)
Haralick, Robert M.; Somani, Arun K.; Wittenbrink, Craig M.; Johnson, Robert; Cooper, Kenneth; Shapiro, Linda G.; Phillips, Ihsin T.; Hwang, Jenq N.; Cheung, William; Yao, Yung H.; Chen, Chung-Ho; Yang, Larry; Daugherty, Brian; Lorbeski, Bob; Loving, Kent; Miller, Tom; Parkins, Larye; Soos, Steven L.
1992-04-01
The Proteus architecture is a highly parallel MIMD, multiple instruction, multiple-data machine, optimized for large granularity tasks such as machine vision and image processing The system can achieve 20 Giga-flops (80 Giga-flops peak). It accepts data via multiple serial links at a rate of up to 640 megabytes/second. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit switched Enhanced Hypercube serial interconnection network for internal data transfers. The system is designed to use 256 to 1,024 RISC processors. The processors use one megabyte external Read/Write Allocating Caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitate fault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, low and high level simulators, and a message passing system for all control needs. Image processing application software includes a variety of point operators neighborhood, operators, convolution, and the mathematical morphology operations of binary and gray scale dilation, erosion, opening, and closing.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.
Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon
2012-01-01
We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kakue, T.; Endo, Y.; Shimobaba, T.; Ito, T.
2014-11-01
We report frequency estimation of loudspeaker diaphragm vibrating at high speed by parallel phase-shifting digital holography which is a technique of single-shot phase-shifting interferometry. This technique records multiple phaseshifted holograms required for phase-shifting interferometry by using space-division multiplexing. We constructed a parallel phase-shifting digital holography system consisting of a high-speed polarization-imaging camera. This camera has a micro-polarizer array which selects four linear polarization axes for 2 × 2 pixels. We set a loudspeaker as an object, and recorded vibration of diaphragm of the loudspeaker by the constructed system. By the constructed system, we demonstrated observation of vibration displacement of loudspeaker diaphragm. In this paper, we aim to estimate vibration frequency of the loudspeaker diaphragm by applying the experimental results to frequency analysis. Holograms consisting of 128 × 128 pixels were recorded at a frame rate of 262,500 frames per second by the camera. A sinusoidal wave was input to the loudspeaker via a phone connector. We observed displacement of the loudspeaker diaphragm vibrating by the system. We also succeeded in estimating vibration frequency of the loudspeaker diaphragm by applying frequency analysis to the experimental results.
Matching pursuit parallel decomposition of seismic data
NASA Astrophysics Data System (ADS)
Li, Chuanhui; Zhang, Fanchang
2017-07-01
In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Energy challenges in optical access and aggregation networks.
Kilper, Daniel C; Rastegarfar, Houman
2016-03-06
Scalability is a critical issue for access and aggregation networks as they must support the growth in both the size of data capacity demands and the multiplicity of access points. The number of connected devices, the Internet of Things, is growing to the tens of billions. Prevailing communication paradigms are reaching physical limitations that make continued growth problematic. Challenges are emerging in electronic and optical systems and energy increasingly plays a central role. With the spectral efficiency of optical systems approaching the Shannon limit, increasing parallelism is required to support higher capacities. For electronic systems, as the density and speed increases, the total system energy, thermal density and energy per bit are moving into regimes that become impractical to support-for example requiring single-chip processor powers above the 100 W limit common today. We examine communication network scaling and energy use from the Internet core down to the computer processor core and consider implications for optical networks. Optical switching in data centres is identified as a potential model from which scalable access and aggregation networks for the future Internet, with the application of integrated photonic devices and intelligent hybrid networking, will emerge. © 2016 The Author(s).
Distributing stand inventory data and maps over a wide area network
Thomas E. Burk
2000-01-01
High-speed networks connecting multiple levels of management are becoming commonplace among forest resources organizations. Such networks can be used to deliver timely spatial and aspatial data relevant to the management of stands to field personnel. A network infrastructure allows maintenance of cost-effective, centralized databases with the potential for updating by...
Transputer parallel processing at NASA Lewis Research Center
NASA Technical Reports Server (NTRS)
Ellis, Graham K.
1989-01-01
The transputer parallel processing lab at NASA Lewis Research Center (LeRC) consists of 69 processors (transputers) that can be connected into various networks for use in general purpose concurrent processing applications. The main goal of the lab is to develop concurrent scientific and engineering application programs that will take advantage of the computational speed increases available on a parallel processor over the traditional sequential processor. Current research involves the development of basic programming tools. These tools will help standardize program interfaces to specific hardware by providing a set of common libraries for applications programmers. The thrust of the current effort is in developing a set of tools for graphics rendering/animation. The applications programmer currently has two options for on-screen plotting. One option can be used for static graphics displays and the other can be used for animated motion. The option for static display involves the use of 2-D graphics primitives that can be called from within an application program. These routines perform the standard 2-D geometric graphics operations in real-coordinate space as well as allowing multiple windows on a single screen.
Modeling evolution of crosstalk in noisy signal transduction networks
NASA Astrophysics Data System (ADS)
Tareen, Ammar; Wingreen, Ned S.; Mukhopadhyay, Ranjan
2018-02-01
Signal transduction networks can form highly interconnected systems within cells due to crosstalk between constituent pathways. To better understand the evolutionary design principles underlying such networks, we study the evolution of crosstalk for two parallel signaling pathways that arise via gene duplication. We use a sequence-based evolutionary algorithm and evolve the network based on two physically motivated fitness functions related to information transmission. We find that one fitness function leads to a high degree of crosstalk while the other leads to pathway specificity. Our results offer insights on the relationship between network architecture and information transmission for noisy biomolecular networks.
A Comparative Propulsion System Analysis for the High-Speed Civil Transport
NASA Technical Reports Server (NTRS)
Berton, Jeffrey J.; Haller, William J.; Senick, Paul F.; Jones, Scott M.; Seidel, Jonathan A.
2005-01-01
Six of the candidate propulsion systems for the High-Speed Civil Transport are the turbojet, turbine bypass engine, mixed flow turbofan, variable cycle engine, Flade engine, and the inverting flow valve engine. A comparison of these propulsion systems by NASA's Glenn Research Center, paralleling studies within the aircraft industry, is presented. This report describes the Glenn Aeropropulsion Analysis Office's contribution to the High-Speed Research Program's 1993 and 1994 propulsion system selections. A parametric investigation of each propulsion cycle's primary design variables is analytically performed. Performance, weight, and geometric data are calculated for each engine. The resulting engines are then evaluated on two airframer-derived supersonic commercial aircraft for a 5000 nautical mile, Mach 2.4 cruise design mission. The effects of takeoff noise, cruise emissions, and cycle design rules are examined.
Yousefzadeh, Amirreza; Jablonski, Miroslaw; Iakymchuk, Taras; Linares-Barranco, Alejandro; Rosado, Alfredo; Plana, Luis A; Temple, Steve; Serrano-Gotarredona, Teresa; Furber, Steve B; Linares-Barranco, Bernabe
2017-10-01
Address event representation (AER) is a widely employed asynchronous technique for interchanging "neural spikes" between different hardware elements in neuromorphic systems. Each neuron or cell in a chip or a system is assigned an address (or ID), which is typically communicated through a high-speed digital bus, thus time-multiplexing a high number of neural connections. Conventional AER links use parallel physical wires together with a pair of handshaking signals (request and acknowledge). In this paper, we present a fully serial implementation using bidirectional SATA connectors with a pair of low-voltage differential signaling (LVDS) wires for each direction. The proposed implementation can multiplex a number of conventional parallel AER links for each physical LVDS connection. It uses flow control, clock correction, and byte alignment techniques to transmit 32-bit address events reliably over multiplexed serial connections. The setup has been tested using commercial Spartan6 FPGAs attaining a maximum event transmission speed of 75 Meps (Mega events per second) for 32-bit events at a line rate of 3.0 Gbps. Full HDL codes (vhdl/verilog) and example demonstration codes for the SpiNNaker platform will be made available.
DOT National Transportation Integrated Search
2012-03-01
Californias High Speed Rail (HSR) offers opportunities for positive urban transformation in station cities, but the economic, urban design, real estate, and municipal behavior variables that may influence such change are understudied. This researc...
High speed multiwire photon camera
NASA Technical Reports Server (NTRS)
Lacy, Jeffrey L. (Inventor)
1991-01-01
An improved multiwire proportional counter camera having particular utility in the field of clinical nuclear medicine imaging. The detector utilizes direct coupled, low impedance, high speed delay lines, the segments of which are capacitor-inductor networks. A pile-up rejection test is provided to reject confused events otherwise caused by multiple ionization events occuring during the readout window.
High speed multiwire photon camera
NASA Technical Reports Server (NTRS)
Lacy, Jeffrey L. (Inventor)
1989-01-01
An improved multiwire proportional counter camera having particular utility in the field of clinical nuclear medicine imaging. The detector utilizes direct coupled, low impedance, high speed delay lines, the segments of which are capacitor-inductor networks. A pile-up rejection test is provided to reject confused events otherwise caused by multiple ionization events occurring during the readout window.
75 FR 36071 - Framework for Broadband Internet Service
Federal Register 2010, 2011, 2012, 2013, 2014
2010-06-24
... contractor, Best Copy and Printing, Inc. (BCPI), Portals II, 445 12th Street, SW., Room CY-B402, Washington...-leading high-speed broadband networks--both wired and wireless--lies at the very core of the FCC's mission... subscribers with high-speed Internet access, as well as many applications or functions that can be used with...
Computer architecture evaluation for structural dynamics computations: Project summary
NASA Technical Reports Server (NTRS)
Standley, Hilda M.
1989-01-01
The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
Atoche, Alejandro Castillo; Castillo, Javier Vázquez
2012-01-01
A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964
Videoconferencing Comes of Age.
ERIC Educational Resources Information Center
Bosak, Steve
2000-01-01
Hundreds of districts are using high-speed videoconferencing for distance learning and resource sharing, inservice training, and districtwide meetings. Speed matters. Districts will need either Ethernet or ATM (asynchronous transfer mode) forms of wide-area networks to connect schools and offices. (MLH)
Forecasting the Short-Term Passenger Flow on High-Speed Railway with Neural Networks
Xie, Mei-Quan; Li, Xia-Miao; Zhou, Wen-Liang; Fu, Yan-Bing
2014-01-01
Short-term passenger flow forecasting is an important component of transportation systems. The forecasting result can be applied to support transportation system operation and management such as operation planning and revenue management. In this paper, a divide-and-conquer method based on neural network and origin-destination (OD) matrix estimation is developed to forecast the short-term passenger flow in high-speed railway system. There are three steps in the forecasting method. Firstly, the numbers of passengers who arrive at each station or depart from each station are obtained from historical passenger flow data, which are OD matrices in this paper. Secondly, short-term passenger flow forecasting of the numbers of passengers who arrive at each station or depart from each station based on neural network is realized. At last, the OD matrices in short-term time are obtained with an OD matrix estimation method. The experimental results indicate that the proposed divide-and-conquer method performs well in forecasting the short-term passenger flow on high-speed railway. PMID:25544838
Parallel performance investigations of an unstructured mesh Navier-Stokes solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
2000-01-01
A Reynolds-averaged Navier-Stokes solver based on unstructured mesh techniques for analysis of high-lift configurations is described. The method makes use of an agglomeration multigrid solver for convergence acceleration. Implicit line-smoothing is employed to relieve the stiffness associated with highly stretched meshes. A GMRES technique is also implemented to speed convergence at the expense of additional memory usage. The solver is cache efficient and fully vectorizable, and is parallelized using a two-level hybrid MPI-OpenMP implementation suitable for shared and/or distributed memory architectures, as well as clusters of shared memory machines. Convergence and scalability results are illustrated for various high-lift cases.
Experimental and Computational Sonic Boom Assessment of Lockheed-Martin N+2 Low Boom Models
NASA Technical Reports Server (NTRS)
Cliff, Susan E.; Durston, Donald A.; Elmiligui, Alaa A.; Walker, Eric L.; Carter, Melissa B.
2015-01-01
Flight at speeds greater than the speed of sound is not permitted over land, primarily because of the noise and structural damage caused by sonic boom pressure waves of supersonic aircraft. Mitigation of sonic boom is a key focus area of the High Speed Project under NASA's Fundamental Aeronautics Program. The project is focusing on technologies to enable future civilian aircraft to fly efficiently with reduced sonic boom, engine and aircraft noise, and emissions. A major objective of the project is to improve both computational and experimental capabilities for design of low-boom, high-efficiency aircraft. NASA and industry partners are developing improved wind tunnel testing techniques and new pressure instrumentation to measure the weak sonic boom pressure signatures of modern vehicle concepts. In parallel, computational methods are being developed to provide rapid design and analysis of supersonic aircraft with improved meshing techniques that provide efficient, robust, and accurate on- and off-body pressures at several body lengths from vehicles with very low sonic boom overpressures. The maturity of these critical parallel efforts is necessary before low-boom flight can be demonstrated and commercial supersonic flight can be realized.
NASA Technical Reports Server (NTRS)
Gibson, Jim; Jordan, Joe; Grant, Terry
1990-01-01
Local Area Network Extensible Simulator (LANES) computer program provides method for simulating performance of high-speed local-area-network (LAN) technology. Developed as design and analysis software tool for networking computers on board proposed Space Station. Load, network, link, and physical layers of layered network architecture all modeled. Mathematically models according to different lower-layer protocols: Fiber Distributed Data Interface (FDDI) and Star*Bus. Written in FORTRAN 77.
Hyttinen, Mika M; Holopainen, Jaakko; René van Weeren, P; Firth, Elwyn C; Helminen, Heikki J; Brama, Pieter A J
2009-01-01
The aim of this study was to record growth-related changes in collagen network organization and proteoglycan distribution in intermittently peak-loaded and continuously lower-level-loaded articular cartilage. Cartilage from the proximal phalangeal bone of the equine metacarpophalangeal joint at birth, at 5, 11 and 18 months, and at 6–10 years of age was collected from two sites. Site 1, at the joint margin, is unloaded at slow gaits but is subjected to high-intensity loading during athletic activity; site 2 is a continuously but less intensively loaded site in the centre of the joint. The degree of collagen parallelism was determined with quantitative polarized light microscopy and the parallelism index for collagen fibrils was computed from the cartilage surface to the osteochondral junction. Concurrent changes in the proteoglycan distribution were quantified with digital densitometry. We found that the parallelism index increased significantly with age (up to 90%). At birth, site 2 exhibited a more organized collagen network than site 1. In adult horses this situation was reversed. The superficial and intermediate zones exhibited the greatest reorganization of collagen. Site 1 had a higher proteoglycan content than site 2 at birth but here too the situation was reversed in adult horses. We conclude that large changes in joint loading during growth and maturation in the period from birth to adulthood profoundly affect the architecture of the collagen network in equine cartilage. In addition, the distribution and content of proteoglycans are modified significantly by altered joint use. Intermittent peak-loading with shear seems to induce higher collagen parallelism and a lower proteoglycan content in cartilage than more constant weight-bearing. Therefore, we hypothesize that the formation of mature articular cartilage with a highly parallel collagen network and relatively low proteoglycan content in the peak-loaded area of a joint is needed to withstand intermittent stress and shear, whereas a constantly weight-bearing joint area benefits from lower collagen parallelism and a higher proteoglycan content. PMID:19732210
Directly measuring of thermal pulse transfer in one-dimensional highly aligned carbon nanotubes.
Zhang, Guang; Liu, Changhong; Fan, Shoushan
2013-01-01
Using a simple and precise instrument system, we directly measured the thermo-physical properties of one-dimensional highly aligned carbon nanotubes (CNTs). A kind of CNT-based macroscopic materials named super aligned carbon nanotube (SACNT) buckypapers was measured in our experiment. We defined a new one-dimensional parameter, the "thermal transfer speed" to characterize the thermal damping mechanisms in the SACNT buckypapers. Our results indicated that the SACNT buckypapers with different densities have obviously different thermal transfer speeds. Furthermore, we found that the thermal transfer speed of high-density SACNT buckypapers may have an obvious damping factor along the CNTs aligned direction. The anisotropic thermal diffusivities of SACNT buckypapers could be calculated by the thermal transfer speeds. The thermal diffusivities obviously increase as the buckypaper-density increases. For parallel SACNT buckypapers, the thermal diffusivity could be as high as 562.2 ± 55.4 mm(2)/s. The thermal conductivities of these SACNT buckypapers were also calculated by the equation k = Cpαρ.
On adaptive learning rate that guarantees convergence in feedforward networks.
Behera, Laxmidhar; Kumar, Swagat; Patnaik, Awhan
2006-09-01
This paper investigates new learning algorithms (LF I and LF II) based on Lyapunov function for the training of feedforward neural networks. It is observed that such algorithms have interesting parallel with the popular backpropagation (BP) algorithm where the fixed learning rate is replaced by an adaptive learning rate computed using convergence theorem based on Lyapunov stability theory. LF II, a modified version of LF I, has been introduced with an aim to avoid local minima. This modification also helps in improving the convergence speed in some cases. Conditions for achieving global minimum for these kind of algorithms have been studied in detail. The performances of the proposed algorithms are compared with BP algorithm and extended Kalman filtering (EKF) on three bench-mark function approximation problems: XOR, 3-bit parity, and 8-3 encoder. The comparisons are made in terms of number of learning iterations and computational time required for convergence. It is found that the proposed algorithms (LF I and II) are much faster in convergence than other two algorithms to attain same accuracy. Finally, the comparison is made on a complex two-dimensional (2-D) Gabor function and effect of adaptive learning rate for faster convergence is verified. In a nutshell, the investigations made in this paper help us better understand the learning procedure of feedforward neural networks in terms of adaptive learning rate, convergence speed, and local minima.
Parallel processing of genomics data
NASA Astrophysics Data System (ADS)
Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario
2016-10-01
The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Wavelet-space correlation imaging for high-speed MRI without motion monitoring or data segmentation.
Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles
2015-12-01
This study aims to (i) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and (ii) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called "wavelet-space correlation imaging", is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI, and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. © 2014 Wiley Periodicals, Inc.
Wavelet-space Correlation Imaging for High-speed MRI without Motion Monitoring or Data Segmentation
Li, Yu; Wang, Hui; Tkach, Jean; Roach, David; Woods, Jason; Dumoulin, Charles
2014-01-01
Purpose This study aims to 1) develop a new high-speed MRI approach by implementing correlation imaging in wavelet-space, and 2) demonstrate the ability of wavelet-space correlation imaging to image human anatomy with involuntary or physiological motion. Methods Correlation imaging is a high-speed MRI framework in which image reconstruction relies on quantification of data correlation. The presented work integrates correlation imaging with a wavelet transform technique developed originally in the field of signal and image processing. This provides a new high-speed MRI approach to motion-free data collection without motion monitoring or data segmentation. The new approach, called “wavelet-space correlation imaging”, is investigated in brain imaging with involuntary motion and chest imaging with free-breathing. Results Wavelet-space correlation imaging can exceed the speed limit of conventional parallel imaging methods. Using this approach with high acceleration factors (6 for brain MRI, 16 for cardiac MRI and 8 for lung MRI), motion-free images can be generated in static brain MRI with involuntary motion and nonsegmented dynamic cardiac/lung MRI with free-breathing. Conclusion Wavelet-space correlation imaging enables high-speed MRI in the presence of involuntary motion or physiological dynamics without motion monitoring or data segmentation. PMID:25470230
Parallel peak pruning for scalable SMP contour tree computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.
As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
Coarse cluster enhancing collaborative recommendation for social network systems
NASA Astrophysics Data System (ADS)
Zhao, Yao-Dong; Cai, Shi-Min; Tang, Ming; Shang, Min-Sheng
2017-10-01
Traditional collaborative filtering based recommender systems for social network systems bring very high demands on time complexity due to computing similarities of all pairs of users via resource usages and annotation actions, which thus strongly suppresses recommending speed. In this paper, to overcome this drawback, we propose a novel approach, namely coarse cluster that partitions similar users and associated items at a high speed to enhance user-based collaborative filtering, and then develop a fast collaborative user model for the social tagging systems. The experimental results based on Delicious dataset show that the proposed model is able to dramatically reduce the processing time cost greater than 90 % and relatively improve the accuracy in comparison with the ordinary user-based collaborative filtering, and is robust for the initial parameter. Most importantly, the proposed model can be conveniently extended by introducing more users' information (e.g., profiles) and practically applied for the large-scale social network systems to enhance the recommending speed without accuracy loss.
Advanced digital signal processing for short-haul and access network
NASA Astrophysics Data System (ADS)
Zhang, Junwen; Yu, Jianjun; Chi, Nan
2016-02-01
Digital signal processing (DSP) has been proved to be a successful technology recently in high speed and high spectrum-efficiency optical short-haul and access network, which enables high performances based on digital equalizations and compensations. In this paper, we investigate advanced DSP at the transmitter and receiver side for signal pre-equalization and post-equalization in an optical access network. A novel DSP-based digital and optical pre-equalization scheme has been proposed for bandwidth-limited high speed short-distance communication system, which is based on the feedback of receiver-side adaptive equalizers, such as least-mean-squares (LMS) algorithm and constant or multi-modulus algorithms (CMA, MMA). Based on this scheme, we experimentally demonstrate 400GE on a single optical carrier based on the highest ETDM 120-GBaud PDM-PAM-4 signal, using one external modulator and coherent detection. A line rate of 480-Gb/s is achieved, which enables 20% forward-error correction (FEC) overhead to keep the 400-Gb/s net information rate. The performance after fiber transmission shows large margin for both short range and metro/regional networks. We also extend the advanced DSP for short haul optical access networks by using high order QAMs. We propose and demonstrate a high speed multi-band CAP-WDM-PON system on intensity modulation, direct detection and digital equalizations. A hybrid modified cascaded MMA post-equalization schemes are used to equalize the multi-band CAP-mQAM signals. Using this scheme, we successfully demonstrates 550Gb/s high capacity WDMPON system with 11 WDM channels, 55 sub-bands, and 10-Gb/s per user in the downstream over 40-km SMF.
Analog Processor To Solve Optimization Problems
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Eberhardt, Silvio P.; Thakoor, Anil P.
1993-01-01
Proposed analog processor solves "traveling-salesman" problem, considered paradigm of global-optimization problems involving routing or allocation of resources. Includes electronic neural network and auxiliary circuitry based partly on concepts described in "Neural-Network Processor Would Allocate Resources" (NPO-17781) and "Neural Network Solves 'Traveling-Salesman' Problem" (NPO-17807). Processor based on highly parallel computing solves problem in significantly less time.
Image sensor with high dynamic range linear output
NASA Technical Reports Server (NTRS)
Yadid-Pecht, Orly (Inventor); Fossum, Eric R. (Inventor)
2007-01-01
Designs and operational methods to increase the dynamic range of image sensors and APS devices in particular by achieving more than one integration times for each pixel thereof. An APS system with more than one column-parallel signal chains for readout are described for maintaining a high frame rate in readout. Each active pixel is sampled for multiple times during a single frame readout, thus resulting in multiple integration times. The operation methods can also be used to obtain multiple integration times for each pixel with an APS design having a single column-parallel signal chain for readout. Furthermore, analog-to-digital conversion of high speed and high resolution can be implemented.
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.
Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele
2015-01-01
Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Fiber-channel audio video standard for military and commercial aircraft product lines
NASA Astrophysics Data System (ADS)
Keller, Jack E.
2002-08-01
Fibre channel is an emerging high-speed digital network technology that combines to make inroads into the avionics arena. The suitability of fibre channel for such applications is largely due to its flexibility in these key areas: Network topologies can be configured in point-to-point, arbitrated loop or switched fabric connections. The physical layer supports either copper or fiber optic implementations with a Bit Error Rate of less than 10-12. Multiple Classes of Service are available. Multiple Upper Level Protocols are supported. Multiple high speed data rates offer open ended growth paths providing speed negotiation within a single network. Current speeds supported by commercially available hardware are 1 and 2 Gbps providing effective data rates of 100 and 200 MBps respectively. Such networks lend themselves well to the transport of digital video and audio data. This paper summarizes an ANSI standard currently in the final approval cycle of the InterNational Committee for Information Technology Standardization (INCITS). This standard defines a flexible mechanism whereby digital video, audio and ancillary data are systematically packaged for transport over a fibre channel network. The basic mechanism, called a container, houses audio and video content functionally grouped as elements of the container called objects. Featured in this paper is a specific container mapping called Simple Parametric Digital Video (SPDV) developed particularly to address digital video in avionics systems. SPDV provides pixel-based video with associated ancillary data typically sourced by various sensors to be processed and/or distributed in the cockpit for presentation via high-resolution displays. Also highlighted in this paper is a streamlined Upper Level Protocol (ULP) called Frame Header Control Procedure (FHCP) targeted for avionics systems where the functionality of a more complex ULP is not required.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahern, S D
2003-06-10
We describe Merlot, a system for delivery of digital imagery over high speed networks. We describe various use cases, the client/server interaction, and the image and network codecs. We also describe some possible applications using Merlot and future work.
High speed all-optical networks
NASA Technical Reports Server (NTRS)
Chlamtac, Imrich
1993-01-01
An inherent problem of conventional point-to-point WAN architectures is that they cannot translate optical transmission bandwidth into comparable user available throughput due to the limiting electronic processing speed of the switching nodes. This report presents the first solution to WDM based WAN networks that overcomes this limitation. The proposed Lightnet architecture takes into account the idiosyncrasies of WDM switching/transmission leading to an efficient and pragmatic solution. The Lightnet architecture trades the ample WDM bandwidth for a reduction in the number of processing stages and a simplification of each switching stage, leading to drastically increased effective network throughputs.
High-speed railway real-time localization auxiliary method based on deep neural network
NASA Astrophysics Data System (ADS)
Chen, Dongjie; Zhang, Wensheng; Yang, Yang
2017-11-01
High-speed railway intelligent monitoring and management system is composed of schedule integration, geographic information, location services, and data mining technology for integration of time and space data. Assistant localization is a significant submodule of the intelligent monitoring system. In practical application, the general access is to capture the image sequences of the components by using a high-definition camera, digital image processing technique and target detection, tracking and even behavior analysis method. In this paper, we present an end-to-end character recognition method based on a deep CNN network called YOLO-toc for high-speed railway pillar plate number. Different from other deep CNNs, YOLO-toc is an end-to-end multi-target detection framework, furthermore, it exhibits a state-of-art performance on real-time detection with a nearly 50fps achieved on GPU (GTX960). Finally, we realize a real-time but high-accuracy pillar plate number recognition system and integrate natural scene OCR into a dedicated classification YOLO-toc model.
NASA Astrophysics Data System (ADS)
Wang, Liping; Jiang, Yao; Li, Tiemin
2014-09-01
Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
High speed all optical networks
NASA Technical Reports Server (NTRS)
Chlamtac, Imrich; Ganz, Aura
1990-01-01
An inherent problem of conventional point-to-point wide area network (WAN) architectures is that they cannot translate optical transmission bandwidth into comparable user available throughput due to the limiting electronic processing speed of the switching nodes. The first solution to wavelength division multiplexing (WDM) based WAN networks that overcomes this limitation is presented. The proposed Lightnet architecture takes into account the idiosyncrasies of WDM switching/transmission leading to an efficient and pragmatic solution. The Lightnet architecture trades the ample WDM bandwidth for a reduction in the number of processing stages and a simplification of each switching stage, leading to drastically increased effective network throughputs. The principle of the Lightnet architecture is the construction and use of virtual topology networks, embedded in the original network in the wavelength domain. For this construction Lightnets utilize the new concept of lightpaths which constitute the links of the virtual topology. Lightpaths are all-optical, multihop, paths in the network that allow data to be switched through intermediate nodes using high throughput passive optical switches. The use of the virtual topologies and the associated switching design introduce a number of new ideas, which are discussed in detail.
ERIC Educational Resources Information Center
Congress of the U.S., Washington, DC. Senate Committee on Energy and Natural Resources.
The purpose of the bill (S. 343), as reported by the Senate Committee on Energy and Natural Resources, is to establish a federal commitment to the advancement of high-performance computing, improve interagency planning and coordination of federal high-performance computing and networking activities, authorize a national high-speed computer…
Dense wavelength division multiplexing devices for metropolitan-area datacom and telecom networks
NASA Astrophysics Data System (ADS)
DeCusatis, Casimer M.; Priest, David G.
2000-12-01
Large data processing environments in use today can require multi-gigabyte or terabyte capacity in the data communication infrastructure; these requirements are being driven by storage area networks with access to petabyte data bases, new architecture for parallel processing which require high bandwidth optical links, and rapidly growing network applications such as electronic commerce over the Internet or virtual private networks. These datacom applications require high availability, fault tolerance, security, and the capacity to recover from any single point of failure without relying on traditional SONET-based networking. These requirements, coupled with fiber exhaust in metropolitan areas, are driving the introduction of dense optical wavelength division multiplexing (DWDM) in data communication systems, particularly for large enterprise servers or mainframes. In this paper, we examine the technical requirements for emerging nextgeneration DWDM systems. Protocols for storage area networks and computer architectures such as Parallel Sysplex are presented, including their fiber bandwidth requirements. We then describe two commercially available DWDM solutions, a first generation 10 channel system and a recently announced next generation 32 channel system. Technical requirements, network management and security, fault tolerant network designs, new network topologies enabled by DWDM, and the role of time division multiplexing in the network are all discussed. Finally, we present a description of testing conducted on these networks and future directions for this technology.
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Wireless Local Area Networks: The Next Evolutionary Step.
ERIC Educational Resources Information Center
Wodarz, Nan
2001-01-01
The Institute of Electrical and Electronics Engineers recently approved a high-speed wireless standard that enables devices from different manufacturers to communicate through a common backbone, making wireless local area networks more feasible in schools. Schools can now use wireless access points and network cards to provide flexible…
DOT National Transportation Integrated Search
2012-03-01
The coming of California High-Speed Rail (HSR) offers opportunities for positive urban transformations in both first-tier and second-tier cities. The research in this report explores the different but complementary roles that first-tier and second-ti...
Yokohama, Noriya
2013-07-01
This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost.
Joint Services Electronics Program
1991-03-05
Parallel Computing Network and Program Professor Abhiram Ranade with M.T. Raghunath and Robert Boothe The goal of our research is to develop high...References/Publications [1] M. T. Raghunath and A. 0. Ranade. "A Simulation-Based Comparison of Interconnection Networks," Proceedings of the 2nd IEEE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan
This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Parallel computing using a Lagrangian formulation
NASA Technical Reports Server (NTRS)
Liou, May-Fun; Loh, Ching Yuen
1991-01-01
A new Lagrangian formulation of the Euler equation is adopted for the calculation of 2-D supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, a better than six times speed-up was achieved on a 8192-processor CM-2 over a single processor of a CRAY-2.
Parallel computing using a Lagrangian formulation
NASA Technical Reports Server (NTRS)
Liou, May-Fun; Loh, Ching-Yuen
1992-01-01
This paper adopts a new Lagrangian formulation of the Euler equation for the calculation of two dimensional supersonic steady flow. The Lagrangian formulation represents the inherent parallelism of the flow field better than the common Eulerian formulation and offers a competitive alternative on parallel computers. The implementation of the Lagrangian formulation on the Thinking Machines Corporation CM-2 Computer is described. The program uses a finite volume, first-order Godunov scheme and exhibits high accuracy in dealing with multidimensional discontinuities (slip-line and shock). By using this formulation, we have achieved better than six times speed-up on a 8192-processor CM-2 over a single processor of a CRAY-2.
A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration
NASA Astrophysics Data System (ADS)
Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves
2011-07-01
An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.
NASA and Industry Benefits of ACTS High Speed Network Interoperability Experiments
NASA Technical Reports Server (NTRS)
Zernic, M. J.; Beering, D. R.; Brooks, D. E.
2000-01-01
This paper provides synopses of the design. implementation, and results of key high data rate communications experiments utilizing the technologies of NASA's Advanced Communications Technology Satellite (ACTS). Specifically, the network protocol and interoperability performance aspects will be highlighted. The objectives of these key experiments will be discussed in their relevant context to NASA missions, as well as, to the comprehensive communications industry. Discussion of the experiment implementation will highlight the technical aspects of hybrid network connectivity, a variety of high-speed interoperability architectures, a variety of network node platforms, protocol layers, internet-based applications, and new work focused on distinguishing between link errors and congestion. In addition, this paper describes the impact of leveraging government-industry partnerships to achieve technical progress and forge synergistic relationships. These relationships will be the key to success as NASA seeks to combine commercially available technology with its own internal technology developments to realize more robust and cost effective communications for space operations.
Cao, Jianfang; Cui, Hongyan; Shi, Hao; Jiao, Lijuan
2016-01-01
A back-propagation (BP) neural network can solve complicated random nonlinear mapping problems; therefore, it can be applied to a wide range of problems. However, as the sample size increases, the time required to train BP neural networks becomes lengthy. Moreover, the classification accuracy decreases as well. To improve the classification accuracy and runtime efficiency of the BP neural network algorithm, we proposed a parallel design and realization method for a particle swarm optimization (PSO)-optimized BP neural network based on MapReduce on the Hadoop platform using both the PSO algorithm and a parallel design. The PSO algorithm was used to optimize the BP neural network's initial weights and thresholds and improve the accuracy of the classification algorithm. The MapReduce parallel programming model was utilized to achieve parallel processing of the BP algorithm, thereby solving the problems of hardware and communication overhead when the BP neural network addresses big data. Datasets on 5 different scales were constructed using the scene image library from the SUN Database. The classification accuracy of the parallel PSO-BP neural network algorithm is approximately 92%, and the system efficiency is approximately 0.85, which presents obvious advantages when processing big data. The algorithm proposed in this study demonstrated both higher classification accuracy and improved time efficiency, which represents a significant improvement obtained from applying parallel processing to an intelligent algorithm on big data.
Network-linked long-time recording high-speed video camera system
NASA Astrophysics Data System (ADS)
Kimura, Seiji; Tsuji, Masataka
2001-04-01
This paper describes a network-oriented, long-recording-time high-speed digital video camera system that utilizes an HDD (Hard Disk Drive) as a recording medium. Semiconductor memories (DRAM, etc.) are the most common image data recording media with existing high-speed digital video cameras. They are extensively used because of their advantage of high-speed writing and reading of picture data. The drawback is that their recording time is limited to only several seconds because the data amount is very large. A recording time of several seconds is sufficient for many applications. However, a much longer recording time is required in some applications where an exact prediction of trigger timing is hard to make. In the Late years, the recording density of the HDD has been dramatically improved, which has attracted more attention to its value as a long-recording-time medium. We conceived an idea that we would be able to build a compact system that makes possible a long time recording if the HDD can be used as a memory unit for high-speed digital image recording. However, the data rate of such a system, capable of recording 640 X 480 pixel resolution pictures at 500 frames per second (fps) with 8-bit grayscale is 153.6 Mbyte/sec., and is way beyond the writing speed of the commonly used HDD. So, we developed a dedicated image compression system and verified its capability to lower the data rate from the digital camera to match the HDD writing rate.
A Software Suite for Testing SpaceWire Devices and Networks
NASA Astrophysics Data System (ADS)
Mills, Stuart; Parkes, Steve
2015-09-01
SpaceWire is a data-handling network for use on-board spacecraft, which connects together instruments, mass-memory, processors, downlink telemetry, and other on-board sub-systems. SpaceWire is simple to implement and has some specific characteristics that help it support data-handling applications in space: high-speed, low-power, simplicity, relatively low implementation cost, and architectural flexibility making it ideal for many space missions. SpaceWire provides high-speed (2 Mbits/s to 200 Mbits/s), bi-directional, full-duplex data-links, which connect together SpaceWire enabled equipment. Data-handling networks can be built to suit particular applications using point-to-point data-links and routing switches. STAR-Dundee’s STAR-System software stack has been designed to meet the needs of engineers designing and developing SpaceWire networks and devices. This paper describes the aims of the software and how those needs were met.
Displacement and deformation measurement for large structures by camera network
NASA Astrophysics Data System (ADS)
Shang, Yang; Yu, Qifeng; Yang, Zhen; Xu, Zhiqiang; Zhang, Xiaohu
2014-03-01
A displacement and deformation measurement method for large structures by a series-parallel connection camera network is presented. By taking the dynamic monitoring of a large-scale crane in lifting operation as an example, a series-parallel connection camera network is designed, and the displacement and deformation measurement method by using this series-parallel connection camera network is studied. The movement range of the crane body is small, and that of the crane arm is large. The displacement of the crane body, the displacement of the crane arm relative to the body and the deformation of the arm are measured. Compared with a pure series or parallel connection camera network, the designed series-parallel connection camera network can be used to measure not only the movement and displacement of a large structure but also the relative movement and deformation of some interesting parts of the large structure by a relatively simple optical measurement system.
NASA Technical Reports Server (NTRS)
Eriksson, S.; Wilder, F. D.; Ergun, R. E.; Schwartz, S. J.; Cassak, P. A.; Burch, J. L.; Chen, Li-Jen; Torbert, R. B.; Phan, T. D.; Lavraud, B.;
2016-01-01
We report observations from the Magnetospheric Multiscale (MMS) satellites of a large guide field magnetic reconnection event. The observations suggest that two of the four MMS spacecraft sampled the electron diffusion region, whereas the other two spacecraft detected the exhaust jet from the event. The guide magnetic field amplitude is approximately 4 times that of the reconnecting field. The event is accompanied by a significant parallel electric field (E(sub parallel lines) that is larger than predicted by simulations. The high-speed (approximately 300 km/s) crossing of the electron diffusion region limited the data set to one complete electron distribution inside of the electron diffusion region, which shows significant parallel heating. The data suggest that E(sub parallel lines) is balanced by a combination of electron inertia and a parallel gradient of the gyrotropic electron pressure.
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
High-Speed Optical Wide-Area Data-Communication Network
NASA Technical Reports Server (NTRS)
Monacos, Steve P.
1994-01-01
Proposed fiber-optic wide-area network (WAN) for digital communication balances input and output flows of data with its internal capacity by routing traffic via dynamically interconnected routing planes. Data transmitted optically through network by wavelength-division multiplexing in synchronous or asynchronous packets. WAN implemented with currently available technology. Network is multiple-ring cyclic shuffle exchange network ensuring traffic reaches its destination with minimum number of hops.
Computer hardware fault administration
Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.
2010-09-14
Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.
Evaluation of barriers for very high speed roadways.
DOT National Transportation Integrated Search
2010-03-01
As TxDOT plans for future expansion of the states highway network, interest in higher design : speeds has been expressed as a means of promoting faster and more efficient travel and movement of goods : within the state. TxDOT funded project 0-6071...
Multiplexed Oversampling Digitizer in 65 nm CMOS for Column-Parallel CCD Readout
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grace, Carl; Walder, Jean-Pierre; von der Lippe, Henrik
2012-04-10
A digitizer designed to read out column-parallel charge-coupled devices (CCDs) used for high-speed X-ray imaging is presented. The digitizer is included as part of the High-Speed Image Preprocessor with Oversampling (HIPPO) integrated circuit. The digitizer module comprises a multiplexed, oversampling, 12-bit, 80 MS/s pipelined Analog-to-Digital Converter (ADC) and a bank of four fast-settling sample-and-hold amplifiers to instrument four analog channels. The ADC multiplexes and oversamples to reduce its area to allow integration that is pitch-matched to the columns of the CCD. Novel design techniques are used to enable oversampling and multiplexing with a reduced power penalty. The ADC exhibits 188more » ?V-rms noise which is less than 1 LSB at a 12-bit level. The prototype is implemented in a commercially available 65 nm CMOS process. The digitizer will lead to a proof-of-principle 2D 10 Gigapixel/s X-ray detector.« less
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K
2010-01-01
An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
The numerical simulation of a high-speed axial flow compressor
NASA Technical Reports Server (NTRS)
Mulac, Richard A.; Adamczyk, John J.
1991-01-01
The advancement of high-speed axial-flow multistage compressors is impeded by a lack of detailed flow-field information. Recent development in compressor flow modeling and numerical simulation have the potential to provide needed information in a timely manner. The development of a computer program is described to solve the viscous form of the average-passage equation system for multistage turbomachinery. Programming issues such as in-core versus out-of-core data storage and CPU utilization (parallelization, vectorization, and chaining) are addressed. Code performance is evaluated through the simulation of the first four stages of a five-stage, high-speed, axial-flow compressor. The second part addresses the flow physics which can be obtained from the numerical simulation. In particular, an examination of the endwall flow structure is made, and its impact on blockage distribution assessed.
High-speed massively parallel scanning
Decker, Derek E [Byron, CA
2010-07-06
A new technique for recording a series of images of a high-speed event (such as, but not limited to: ballistics, explosives, laser induced changes in materials, etc.) is presented. Such technique(s) makes use of a lenslet array to take image picture elements (pixels) and concentrate light from each pixel into a spot that is much smaller than the pixel. This array of spots illuminates a detector region (e.g., film, as one embodiment) which is scanned transverse to the light, creating tracks of exposed regions. Each track is a time history of the light intensity for a single pixel. By appropriately configuring the array of concentrated spots with respect to the scanning direction of the detection material, different tracks fit between pixels and sufficient lengths are possible which can be of interest in several high-speed imaging applications.
Design of a high-speed digital processing element for parallel simulation
NASA Technical Reports Server (NTRS)
Milner, E. J.; Cwynar, D. S.
1983-01-01
A prototype of a custom designed computer to be used as a processing element in a multiprocessor based jet engine simulator is described. The purpose of the custom design was to give the computer the speed and versatility required to simulate a jet engine in real time. Real time simulations are needed for closed loop testing of digital electronic engine controls. The prototype computer has a microcycle time of 133 nanoseconds. This speed was achieved by: prefetching the next instruction while the current one is executing, transporting data using high speed data busses, and using state of the art components such as a very large scale integration (VLSI) multiplier. Included are discussions of processing element requirements, design philosophy, the architecture of the custom designed processing element, the comprehensive instruction set, the diagnostic support software, and the development status of the custom design.
GPU Particle Tracking and MHD Simulations with Greatly Enhanced Computational Speed
NASA Astrophysics Data System (ADS)
Ziemba, T.; O'Donnell, D.; Carscadden, J.; Cash, M.; Winglee, R.; Harnett, E.
2008-12-01
GPUs are intrinsically highly parallelized systems that provide more than an order of magnitude computing speed over a CPU based systems, for less cost than a high end-workstation. Recent advancements in GPU technologies allow for full IEEE float specifications with performance up to several hundred GFLOPs per GPU, and new software architectures have recently become available to ease the transition from graphics based to scientific applications. This allows for a cheap alternative to standard supercomputing methods and should increase the time to discovery. 3-D particle tracking and MHD codes have been developed using NVIDIA's CUDA and have demonstrated speed up of nearly a factor of 20 over equivalent CPU versions of the codes. Such a speed up enables new applications to develop, including real time running of radiation belt simulations and real time running of global magnetospheric simulations, both of which could provide important space weather prediction tools.
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
A Comparison of Lifting-Line and CFD Methods with Flight Test Data from a Research Puma Helicopter
NASA Technical Reports Server (NTRS)
Bousman, William G.; Young, Colin; Toulmay, Francois; Gilbert, Neil E.; Strawn, Roger C.; Miller, Judith V.; Maier, Thomas H.; Costes, Michel; Beaumier, Philippe
1996-01-01
Four lifting-line methods were compared with flight test data from a research Puma helicopter and the accuracy assessed over a wide range of flight speeds. Hybrid Computational Fluid Dynamics (CFD) methods were also examined for two high-speed conditions. A parallel analytical effort was performed with the lifting-line methods to assess the effects of modeling assumptions and this provided insight into the adequacy of these methods for load predictions.