A high-speed linear algebra library with automatic parallelism
NASA Technical Reports Server (NTRS)
Boucher, Michael L.
1994-01-01
Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
NASA Technical Reports Server (NTRS)
1988-01-01
Final report to NASA LeRC on the development of gallium arsenide (GaAS) high-speed, low power serial/parallel interface modules. The report discusses the development and test of a family of 16, 32 and 64 bit parallel to serial and serial to parallel integrated circuits using a self aligned gate MESFET technology developed at the Honeywell Sensors and Signal Processing Laboratory. Lab testing demonstrated 1.3 GHz clock rates at a power of 300 mW. This work was accomplished under contract number NAS3-24676.
Parallel design patterns for a low-power, software-defined compressed video encoder
NASA Astrophysics Data System (ADS)
Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar
2011-06-01
Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.
Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms
He, Li; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions. PMID:29123546
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Achieving High Performance in Parallel Applications via Kernel-Application Interaction
1996-04-01
time systems include airplane autopilot or nuclear power plant control. New complex, parallel soft real-time applica- tions have been generating...to keep as many sheep on the table as possible, and the more powerful the sheep behavior-models and look-ahead, the better the results. General...fact that it provides considerable flexibility when considering the amount of processing power to allocate to a planner. In this experiment we again
NASA Astrophysics Data System (ADS)
Han, Jiang-An; Kong, Zhi-Hui; Ma, Kaixue; Yeo, Kiat Seng; Lim, Wei Meng
2016-11-01
This paper presents a novel balun for a millimeter-wave power amplifier (PA) design to achieve high-power density in a 65-nm low-power (LP) CMOS process. By using a concentric winding technique, the proposed parallel combining balun with compact size accomplishes power combining and unbalance-balance conversion concurrently. For calculating its power combination efficiency in the condition of various amplitude and phase wave components, a method basing on S-parameters is derived. Based on the proposed parallel combining balun, a fabricated 60-GHz industrial, scientific, and medical (ISM) band PA with single-ended I/O achieves an 18.9-dB gain and an 8.8-dBm output power at 1-dB compression and 14.3-dBm saturated output power ( P sat) at 62 GHz. This PA occupying only a 0.10-mm2 core area has demonstrated a high-power density of 269.15 mW/mm2 in 65 nm LP CMOS.
A parallel implementation of an off-lattice individual-based model of multicellular populations
NASA Astrophysics Data System (ADS)
Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe
2015-07-01
As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Solar Cell Modules with Parallel Oriented Interconnections
NASA Technical Reports Server (NTRS)
1979-01-01
Twenty-four solar modules, half of which were 48 cells in an all-series electrical configuration and half of a six parallel cells by eight series cells were provided. Upon delivery of environmentally tested modules, low power outputs were discovered. These low power modules were determined to have cracked cells which were thought to cause the low output power. The cracks tended to be linear or circular which were caused by different stressing mechanisms. These stressing mechanisms were fully explored. Efforts were undertaken to determine the causes of cell fracture. This resulted in module design and process modifications. The design and process changes were subsequently implemented in production.
Systems and methods for rapid processing and storage of data
Stalzer, Mark A.
2017-01-24
Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Improved CDMA Performance Using Parallel Interference Cancellation
NASA Technical Reports Server (NTRS)
Simon, Marvin; Divsalar, Dariush
1995-01-01
This report considers a general parallel interference cancellation scheme that significantly reduces the degradation effect of user interference but with a lesser implementation complexity than the maximum-likelihood technique. The scheme operates on the fact that parallel processing simultaneously removes from each user the interference produced by the remaining users accessing the channel in an amount proportional to their reliability. The parallel processing can be done in multiple stages. The proposed scheme uses tentative decision devices with different optimum thresholds at the multiple stages to produce the most reliably received data for generation and cancellation of user interference. The 1-stage interference cancellation is analyzed for three types of tentative decision devices, namely, hard, null zone, and soft decision, and two types of user power distribution, namely, equal and unequal powers. Simulation results are given for a multitude of different situations, in particular, those cases for which the analysis is too complex.
Gooding, Owen W
2004-06-01
The use of parallel synthesis techniques with statistical design of experiment (DoE) methods is a powerful combination for the optimization of chemical processes. Advances in parallel synthesis equipment and easy to use software for statistical DoE have fueled a growing acceptance of these techniques in the pharmaceutical industry. As drug candidate structures become more complex at the same time that development timelines are compressed, these enabling technologies promise to become more important in the future.
Options for Parallelizing a Planning and Scheduling Algorithm
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.
2011-01-01
Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy
Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger
2009-01-01
Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined. PMID:19564686
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.
Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger
2009-07-01
Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Multi-Kilowatt Power Module for High-Power Hall Thrusters
NASA Technical Reports Server (NTRS)
Pinero, Luis R.; Bowers, Glen E.
2005-01-01
Future NASA missions will require high-performance electric propulsion systems. Hall thrusters are being developed at NASA Glenn for high-power, high-specific impulse operation. These thrusters operate at power levels up to 50 kW of power and discharge voltages in excess of 600 V. A parallel effort is being conducted to develop power electronics for these thrusters that push the technology beyond the 5kW state-of-the-art power level. A 10 kW power module was designed to produce an output of 500 V and 20 A from a nominal 100 V input. Resistive load tests revealed efficiencies in excess of 96 percent. Load current share and phase synchronization circuits were designed and tested that will allow connecting multiple modules in parallel to process higher power.
NASA Technical Reports Server (NTRS)
Schoenfeld, A. D.; Yu, Y.
1973-01-01
Versatile standardized pulse modulation nondissipatively regulated control signal processing circuits were applied to three most commonly used dc to dc power converter configurations: (1) the series switching buck-regulator, (2) the pulse modulated parallel inverter, and (3) the buck-boost converter. The unique control concept and the commonality of control functions for all switching regulators have resulted in improved static and dynamic performance and control circuit standardization. New power-circuit technology was also applied to enhance reliability and to achieve optimum weight and efficiency.
Orthorectification by Using Gpgpu Method
NASA Astrophysics Data System (ADS)
Sahin, H.; Kulur, S.
2012-07-01
Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
Parallel approach in RDF query processing
NASA Astrophysics Data System (ADS)
Vajgl, Marek; Parenica, Jan
2017-07-01
Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
Application of parallelized software architecture to an autonomous ground vehicle
NASA Astrophysics Data System (ADS)
Shakya, Rahul; Wright, Adam; Shin, Young Ho; Momin, Orko; Petkovsek, Steven; Wortman, Paul; Gautam, Prasanna; Norton, Adam
2011-01-01
This paper presents improvements made to Q, an autonomous ground vehicle designed to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2010 IGVC, Q was upgraded with a new parallelized software architecture and a new vision processor. Improvements were made to the power system reducing the number of batteries required for operation from six to one. In previous years, a single state machine was used to execute the bulk of processing activities including sensor interfacing, data processing, path planning, navigation algorithms and motor control. This inefficient approach led to poor software performance and made it difficult to maintain or modify. For IGVC 2010, the team implemented a modular parallel architecture using the National Instruments (NI) LabVIEW programming language. The new architecture divides all the necessary tasks - motor control, navigation, sensor data collection, etc. into well-organized components that execute in parallel, providing considerable flexibility and facilitating efficient use of processing power. Computer vision is used to detect white lines on the ground and determine their location relative to the robot. With the new vision processor and some optimization of the image processing algorithm used last year, two frames can be acquired and processed in 70ms. With all these improvements, Q placed 2nd in the autonomous challenge.
Contingency Analysis Post-Processing With Advanced Computing and Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Glaesemann, Kurt; Fitzhenry, Erin
Contingency analysis is a critical function widely used in energy management systems to assess the impact of power system component failures. Its outputs are important for power system operation for improved situational awareness, power system planning studies, and power market operations. With the increased complexity of power system modeling and simulation caused by increased energy production and demand, the penetration of renewable energy and fast deployment of smart grid devices, and the trend of operating grids closer to their capacity for better efficiency, more and more contingencies must be executed and analyzed quickly in order to ensure grid reliability andmore » accuracy for the power market. Currently, many researchers have proposed different techniques to accelerate the computational speed of contingency analysis, but not much work has been published on how to post-process the large amount of contingency outputs quickly. This paper proposes a parallel post-processing function that can analyze contingency analysis outputs faster and display them in a web-based visualization tool to help power engineers improve their work efficiency by fast information digestion. Case studies using an ESCA-60 bus system and a WECC planning system are presented to demonstrate the functionality of the parallel post-processing technique and the web-based visualization tool.« less
NASA Technical Reports Server (NTRS)
Morris, Robert A.
1990-01-01
The emphasis is on defining a set of communicating processes for intelligent spacecraft secondary power distribution and control. The computer hardware and software implementation platform for this work is that of the ADEPTS project at the Johnson Space Center (JSC). The electrical power system design which was used as the basis for this research is that of Space Station Freedom, although the functionality of the processes defined here generalize to any permanent manned space power control application. First, the Space Station Electrical Power Subsystem (EPS) hardware to be monitored is described, followed by a set of scenarios describing typical monitor and control activity. Then, the parallel distributed problem solving approach to knowledge engineering is introduced. There follows a two-step presentation of the intelligent software design for secondary power control. The first step decomposes the problem of monitoring and control into three primary functions. Each of the primary functions is described in detail. Suggestions for refinements and embelishments in design specifications are given.
B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.
Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang
2016-03-01
Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.
Developing software to use parallel processing effectively. Final report, June-December 1987
DOE Office of Scientific and Technical Information (OSTI.GOV)
Center, J.
1988-10-01
This report describes the difficulties involved in writing efficient parallel programs and describes the hardware and software support currently available for generating software that utilizes processing effectively. Historically, the processing rate of single-processor computers has increased by one order of magnitude every five years. However, this pace is slowing since electronic circuitry is coming up against physical barriers. Unfortunately, the complexity of engineering and research problems continues to require ever more processing power (far in excess of the maximum estimated 3 Gflops achievable by single-processor computers). For this reason, parallel-processing architectures are receiving considerable interest, since they offer high performancemore » more cheaply than a single-processor supercomputer, such as the Cray.« less
Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas
2016-04-01
Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Planning and Resource Management in an Intelligent Automated Power Management System
NASA Technical Reports Server (NTRS)
Morris, Robert A.
1991-01-01
Power system management is a process of guiding a power system towards the objective of continuous supply of electrical power to a set of loads. Spacecraft power system management requires planning and scheduling, since electrical power is a scarce resource in space. The automation of power system management for future spacecraft has been recognized as an important R&D goal. Several automation technologies have emerged including the use of expert systems for automating human problem solving capabilities such as rule based expert system for fault diagnosis and load scheduling. It is questionable whether current generation expert system technology is applicable for power system management in space. The objective of the ADEPTS (ADvanced Electrical Power management Techniques for Space systems) is to study new techniques for power management automation. These techniques involve integrating current expert system technology with that of parallel and distributed computing, as well as a distributed, object-oriented approach to software design. The focus of the current study is the integration of new procedures for automatically planning and scheduling loads with procedures for performing fault diagnosis and control. The objective is the concurrent execution of both sets of tasks on separate transputer processors, thus adding parallelism to the overall management process.
Balancing computation and communication power in power constrained clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piga, Leonardo; Paul, Indrani; Huang, Wei
Systems, apparatuses, and methods for balancing computation and communication power in power constrained environments. A data processing cluster with a plurality of compute nodes may perform parallel processing of a workload in a power constrained environment. Nodes that finish tasks early may be power-gated based on one or more conditions. In some scenarios, a node may predict a wait duration and go into a reduced power consumption state if the wait duration is predicted to be greater than a threshold. The power saved by power-gating one or more nodes may be reassigned for use by other nodes. A cluster agentmore » may be configured to reassign the unused power to the active nodes to expedite workload processing.« less
Document Image Parsing and Understanding using Neuromorphic Architecture
2015-03-01
processing speed at different layers. In the pattern matching layer, the computing power of multicore processors is explored to reduce the processing...developed to reduce the processing speed at different layers. In the pattern matching layer, the computing power of multicore processors is explored... cortex where the complex data is reduced to abstract representations. The abstract representation is compared to stored patterns in massively parallel
Characterization of wastewater treatment by two microbial fuel cells in continuous flow operation.
Kubota, Keiichi; Watanabe, Tomohide; Yamaguchi, Takashi; Syutsubo, Kazuaki
2016-01-01
A two serially connected single-chamber microbial fuel cell (MFC) was applied to the treatment of diluted molasses wastewater in a continuous operation mode. In addition, the effect of series and parallel connection between the anodes and the cathode on power generation was investigated experimentally. The two serially connected MFC process achieved 79.8% of chemical oxygen demand removal and 11.6% of Coulombic efficiency when the hydraulic retention time of the whole process was 26 h. The power densities were 0.54, 0.34 and 0.40 W m(-3) when electrodes were in individual connection, serial connection and parallel connection modes, respectively. A high open circuit voltage was obtained in the serial connection. Power density decreased at low organic loading rates (OLR) due to the shortage of organic matter. Power generation efficiency tended to decrease as a result of enhancement of methane fermentation at high OLRs. Therefore, high power density and efficiency can be achieved by using a suitable OLR range.
Comparative Implementation of High Performance Computing for Power System Dynamic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng
Dynamic simulation for transient stability assessment is one of the most important, but intensive, computations for power system planning and operation. Present commercial software is mainly designed for sequential computation to run a single simulation, which is very time consuming with a single processer. The application of High Performance Computing (HPC) to dynamic simulations is very promising in accelerating the computing process by parallelizing its kernel algorithms while maintaining the same level of computation accuracy. This paper describes the comparative implementation of four parallel dynamic simulation schemes in two state-of-the-art HPC environments: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).more » These implementations serve to match the application with dedicated multi-processor computing hardware and maximize the utilization and benefits of HPC during the development process.« less
Fast Computation and Assessment Methods in Power System Analysis
NASA Astrophysics Data System (ADS)
Nagata, Masaki
Power system analysis is essential for efficient and reliable power system operation and control. Recently, online security assessment system has become of importance, as more efficient use of power networks is eagerly required. In this article, fast power system analysis techniques such as contingency screening, parallel processing and intelligent systems application are briefly surveyed from the view point of their application to online dynamic security assessment.
2012-02-17
to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of
Parallel processing methods for space based power systems
NASA Technical Reports Server (NTRS)
Berry, F. C.
1993-01-01
This report presents a method for doing load-flow analysis of a power system by using a decomposition approach. The power system for the Space Shuttle is used as a basis to build a model for the load-flow analysis. To test the decomposition method for doing load-flow analysis, simulations were performed on power systems of 16, 25, 34, 43, 52, 61, 70, and 79 nodes. Each of the power systems was divided into subsystems and simulated under steady-state conditions. The results from these tests have been found to be as accurate as tests performed using a standard serial simulator. The division of the power systems into different subsystems was done by assigning a processor to each area. There were 13 transputers available, therefore, up to 13 different subsystems could be simulated at the same time. This report has preliminary results for a load-flow analysis using a decomposition principal. The report shows that the decomposition algorithm for load-flow analysis is well suited for parallel processing and provides increases in the speed of execution.
Integrated Silicon Carbide Power Electronic Block
DOE Office of Scientific and Technical Information (OSTI.GOV)
Radhakrishnan, Rahul
2017-11-07
Research involved in this project is aimed at monolithically integrating an anti-parallel diode to the SiC MOSFET switch, so as to avoid having to use an external anti-parallel diode in power circuit applications. SiC MOSFETs are replacing Si MOSFETs and IGBTs in many applications, yet the high bandgap of the body diode in SiC MOSFET and consequent need for an external anti-parallel diode increases costs and discourages circuit designers from adopting this technology. Successful demonstration and subsequent commercialization of this technology would reduce SiC MOSFET cost and additionally reduce component count as well as other costs at the power circuitmore » level. In this Phase I project, we have created multiple device designs, set up a process for device fabrication at the 150mm SiC foundry XFAB Texas, demonstrated unit-processes for device fabrication in short loops and started full flow device fabrication. Key findings of the development activity were: The limits of coverage of photoresist over the topology of thick polysilicon structures covered with oxide, which required larger feature dimensions to overcome; and The insufficient process margin for removing oxide spacers from polysilicon field ring features which could result in loss of some features without further process development No fundamental obstacles were uncovered during the process development. Given sufficient time for additional development it is likely that processes could be tuned to realize the monolithically integrated SiC JBS diode and MOSFET. Sufficient funds were not available in this program to resolve processing difficulties and fabricate the devices.« less
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.
Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele
2015-01-01
Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
NASA Astrophysics Data System (ADS)
Zhilenkov, A. A.; Kapitonov, A. A.
2017-10-01
It is known that many of today’s ships and vessels have a shaft generator as a part of their power plants. Modern automatic control systems used in the world’s fleet do not enable their shaft generators to operate in parallel with the main diesel generators for long-term sustenance of the total load of the ship network. On the other hand, according to our calculations and experiments, a shaft generator operated in parallel with the main power plant helps save at least 10% of fuel while making the power system of the ship more efficient, reliable, and eco-friendly. The fouling and corrosion of the propeller as well as the weather conditions of navigation affect its modulus of resistance. It changes the free component of the transient process of shaft generator stress frequency changes in transient processes. While the shaft generator and the diesel generator of the ship power plant are paralleled, there emerges an angle between their EMF. This results in equalizing currents generated between them. The altering torque in the drive-shaft line—propeller system causes torsional fluctuations of the ship shaft line. To compensate for the effect of destabilizing factors and torsional fluctuations of the shaft line on the dynamic characteristics of the transient process that alters the RPM of the main engine, sliding mode controls can be used. To synthesize such a control, one has to evaluate the effect of destabilizing factors.
Accelerated Adaptive MGS Phase Retrieval
NASA Technical Reports Server (NTRS)
Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang
2011-01-01
The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.
Parallel processing via a dual olfactory pathway in the honeybee.
Brill, Martin F; Rosenbaum, Tobias; Reus, Isabelle; Kleineidam, Christoph J; Nawrot, Martin P; Rössler, Wolfgang
2013-02-06
In their natural environment, animals face complex and highly dynamic olfactory input. Thus vertebrates as well as invertebrates require fast and reliable processing of olfactory information. Parallel processing has been shown to improve processing speed and power in other sensory systems and is characterized by extraction of different stimulus parameters along parallel sensory information streams. Honeybees possess an elaborate olfactory system with unique neuronal architecture: a dual olfactory pathway comprising a medial projection-neuron (PN) antennal lobe (AL) protocerebral output tract (m-APT) and a lateral PN AL output tract (l-APT) connecting the olfactory lobes with higher-order brain centers. We asked whether this neuronal architecture serves parallel processing and employed a novel technique for simultaneous multiunit recordings from both tracts. The results revealed response profiles from a high number of PNs of both tracts to floral, pheromonal, and biologically relevant odor mixtures tested over multiple trials. PNs from both tracts responded to all tested odors, but with different characteristics indicating parallel processing of similar odors. Both PN tracts were activated by widely overlapping response profiles, which is a requirement for parallel processing. The l-APT PNs had broad response profiles suggesting generalized coding properties, whereas the responses of m-APT PNs were comparatively weaker and less frequent, indicating higher odor specificity. Comparison of response latencies within and across tracts revealed odor-dependent latencies. We suggest that parallel processing via the honeybee dual olfactory pathway provides enhanced odor processing capabilities serving sophisticated odor perception and olfactory demands associated with a complex olfactory world of this social insect.
NASA Astrophysics Data System (ADS)
Weigel, Martin
2011-09-01
Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that of current CPUs by large factors, results from the relative simplicity of the GPU architectures as compared to CPUs, combined with a large number of parallel processing units on a single chip. To benefit from this setup for general computing purposes, the problems at hand need to be prepared in a way to profit from the inherent parallelism and hierarchical structure of memory accesses. In this contribution I discuss the performance potential for simulating spin models, such as the Ising model, on GPU as compared to conventional simulations on CPU.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Automated design of spacecraft systems power subsystems
NASA Technical Reports Server (NTRS)
Terrile, Richard J.; Kordon, Mark; Mandutianu, Dan; Salcedo, Jose; Wood, Eric; Hashemi, Mona
2006-01-01
This paper discusses the application of evolutionary computing to a dynamic space vehicle power subsystem resource and performance simulation in a parallel processing environment. Our objective is to demonstrate the feasibility, application and advantage of using evolutionary computation techniques for the early design search and optimization of space systems.
NASA Astrophysics Data System (ADS)
Zibner, F.; Fornaroli, C.; Holtkamp, J.; Shachaf, Lior; Kaplan, Natan; Gillner, A.
2017-08-01
High-precision laser micro machining gains more importance in industrial applications every month. Optical systems like the helical optics offer highest quality together with controllable and adjustable drilling geometry, thus as taper angle, aspect ratio and heat effected zone. The helical optics is based on a rotating Dove-prism which is mounted in a hollow shaft engine together with other optical elements like wedge prisms and plane plates. Although the achieved quality can be interpreted as extremely high the low process efficiency is a main reason that this manufacturing technology has only limited demand within the industrial market. The objective of the research studies presented in this paper is to dramatically increase process efficiency as well as process flexibility. During the last years, the average power of commercial ultra-short pulsed laser sources has increased significantly. The efficient utilization of the high average laser power in the field of material processing requires an effective distribution of the laser power onto the work piece. One approach to increase the efficiency is the application of beam splitting devices to enable parallel processing. Multi beam processing is used to parallelize the fabrication of periodic structures as most application only require a partial amount of the emitted ultra-short pulsed laser power. In order to achieve highest flexibility while using multi beam processing the single beams are diverted and re-guided in a way that enables the opportunity to process with each partial beam on locally apart probes or semimanufactures.
Extended Logic Intelligent Processing System for a Sensor Fusion Processor Hardware
NASA Technical Reports Server (NTRS)
Stoica, Adrian; Thomas, Tyson; Li, Wei-Te; Daud, Taher; Fabunmi, James
2000-01-01
The paper presents the hardware implementation and initial tests from a low-power, highspeed reconfigurable sensor fusion processor. The Extended Logic Intelligent Processing System (ELIPS) is described, which combines rule-based systems, fuzzy logic, and neural networks to achieve parallel fusion of sensor signals in compact low power VLSI. The development of the ELIPS concept is being done to demonstrate the interceptor functionality which particularly underlines the high speed and low power requirements. The hardware programmability allows the processor to reconfigure into different machines, taking the most efficient hardware implementation during each phase of information processing. Processing speeds of microseconds have been demonstrated using our test hardware.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Möckel, M.; Wiedemann, C.; Flegel, S.; Gelhaus, J.; Vörsmann, P.; Klinkrad, H.; Krag, H.
2011-07-01
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction to OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Using parallel computing for the display and simulation of the space debris environment
NASA Astrophysics Data System (ADS)
Moeckel, Marek; Wiedemann, Carsten; Flegel, Sven Kevin; Gelhaus, Johannes; Klinkrad, Heiner; Krag, Holger; Voersmann, Peter
Parallelism is becoming the leading paradigm in today's computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating and displaying three-dimensional computer generated imagery in real time requires complex numerical operations to be performed at high speed on a large number of objects. Since most of these objects can be processed independently, parallel computing is applicable in this field. Modern graphics processing units (GPUs) have become capable of performing millions of matrix and vector operations per second on multiple objects simultaneously. As a side project, a software tool is currently being developed at the Institute of Aerospace Systems that provides an animated, three-dimensional visualization of both actual and simulated space debris objects. Due to the nature of these objects it is possible to process them individually and independently from each other. Therefore, an analytical orbit propagation algorithm has been implemented to run on a GPU. By taking advantage of all its processing power a huge performance increase, compared to its CPU-based counterpart, could be achieved. For several years efforts have been made to harness this computing power for applications other than computer graphics. Software tools for the simulation of space debris are among those that could profit from embracing parallelism. With recently emerged software development tools such as OpenCL it is possible to transfer the new algorithms used in the visualization outside the field of computer graphics and implement them, for example, into the space debris simulation environment. This way they can make use of parallel hardware such as GPUs and Multi-Core-CPUs for faster computation. In this paper the visualization software will be introduced, including a comparison between the serial and the parallel method of orbit propagation. Ways of how to use the benefits of the latter method for space debris simulation will be discussed. An introduction of OpenCL will be given as well as an exemplary algorithm from the field of space debris simulation.
Advanced miniature processing handware for ATR applications
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin (Inventor); Daud, Taher (Inventor); Thakoor, Anikumar (Inventor)
2003-01-01
A Hybrid Optoelectronic Neural Object Recognition System (HONORS), is disclosed, comprising two major building blocks: (1) an advanced grayscale optical correlator (OC) and (2) a massively parallel three-dimensional neural-processor. The optical correlator, with its inherent advantages in parallel processing and shift invariance, is used for target of interest (TOI) detection and segmentation. The three-dimensional neural-processor, with its robust neural learning capability, is used for target classification and identification. The hybrid optoelectronic neural object recognition system, with its powerful combination of optical processing and neural networks, enables real-time, large frame, automatic target recognition (ATR).
SPROC: A multiple-processor DSP IC
NASA Technical Reports Server (NTRS)
Davis, R.
1991-01-01
A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.
NASA Astrophysics Data System (ADS)
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.
Parallel computing method for simulating hydrological processesof large rivers under climate change
NASA Astrophysics Data System (ADS)
Wang, H.; Chen, Y.
2016-12-01
Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.
Parallel and Distributed Systems for Probabilistic Reasoning
2012-12-01
work at CMU I had the opportunity to work with Andreas Krause on Gaussian process models for signal quality estimation in wireless sensor networks ...we reviewed the natural parallelization of the belief propagation algorithm using the synchronous schedule and demonstrated both theoretically and...problem is that the power-law sparsity structure, commonly found in graphs derived from natural phenomena (e.g., social networks and the web
Evolutionary computing for the design search and optimization of space vehicle power subsystems
NASA Technical Reports Server (NTRS)
Kordon, M.; Klimeck, G.; Hanks, D.
2004-01-01
Evolutionary computing has proven to be a straightforward and robust approach for optimizing a wide range of difficult analysis and design problems. This paper discusses the application of these techniques to an existing space vehicle power subsystem resource and performance analysis simulation in a parallel processing environment.
NASA Astrophysics Data System (ADS)
Farahabadi, Payam Masoumi; Basaligheh, Ali; Saffari, Parvaneh; Moez, Kambiz
2017-06-01
This paper presents a compact 60-GHz power amplifier utilizing a four-way on-chip parallel power combiner and splitter. The proposed topology provides the capability of combining the output power of four individual power amplifier cores in a compact die area. Each power amplifier core consists of a three-stage common-source amplifier with transformer-coupled impedance matching networks. Fabricated in 65-nm CMOS process, the measured gain of the 0.19-mm2 power amplifier at 60 GHz is 18.8 and 15 dB utilizing 1.4 and 1.0 V supply. Three-decibel band width of 4 GHz and P1dB of 16.9 dBm is measured while consuming 424 mW from a 1.4-V supply. A maximum saturated output power of 18.3 dBm is measured with the 15.9% peak power added efficiency at 60 GHz. The measured insertion loss is 1.9 dB at 60 GHz. The proposed power amplifier achieves the highest power density (power/area) compared to the reported 60-GHz CMOS power amplifiers in 65 nm or older CMOS technologies.
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
Execution of parallel algorithms on a heterogeneous multicomputer
NASA Astrophysics Data System (ADS)
Isenstein, Barry S.; Greene, Jonathon
1995-04-01
Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
Associative Pattern Recognition In Analog VLSI Circuits
NASA Technical Reports Server (NTRS)
Tawel, Raoul
1995-01-01
Winner-take-all circuit selects best-match stored pattern. Prototype cascadable very-large-scale integrated (VLSI) circuit chips built and tested to demonstrate concept of electronic associative pattern recognition. Based on low-power, sub-threshold analog complementary oxide/semiconductor (CMOS) VLSI circuitry, each chip can store 128 sets (vectors) of 16 analog values (vector components), vectors representing known patterns as diverse as spectra, histograms, graphs, or brightnesses of pixels in images. Chips exploit parallel nature of vector quantization architecture to implement highly parallel processing in relatively simple computational cells. Through collective action, cells classify input pattern in fraction of microsecond while consuming power of few microwatts.
A tool for simulating parallel branch-and-bound methods
NASA Astrophysics Data System (ADS)
Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail
2016-01-01
The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Sittig, D. F.; Orr, J. A.
1991-01-01
Various methods have been proposed in an attempt to solve problems in artifact and/or alarm identification including expert systems, statistical signal processing techniques, and artificial neural networks (ANN). ANNs consist of a large number of simple processing units connected by weighted links. To develop truly robust ANNs, investigators are required to train their networks on huge training data sets, requiring enormous computing power. We implemented a parallel version of the backward error propagation neural network training algorithm in the widely portable parallel programming language C-Linda. A maximum speedup of 4.06 was obtained with six processors. This speedup represents a reduction in total run-time from approximately 6.4 hours to 1.5 hours. We conclude that use of the master-worker model of parallel computation is an excellent method for obtaining speedups in the backward error propagation neural network training algorithm. PMID:1807607
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
Liu, Gangjun; Zhang, Jun; Yu, Lingfeng; Xie, Tuqiang; Chen, Zhongping
2010-01-01
With the increase of the A-line speed of optical coherence tomography (OCT) systems, real-time processing of acquired data has become a bottleneck. The shared-memory parallel computing technique is used to process OCT data in real time. The real-time processing power of a quad-core personal computer (PC) is analyzed. It is shown that the quad-core PC could provide real-time OCT data processing ability of more than 80K A-lines per second. A real-time, fiber-based, swept source polarization-sensitive OCT system with 20K A-line speed is demonstrated with this technique. The real-time 2D and 3D polarization-sensitive imaging of chicken muscle and pig tendon is also demonstrated. PMID:19904337
Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki
2014-12-01
As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
A learnable parallel processing architecture towards unity of memory and computing
NASA Astrophysics Data System (ADS)
Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.
2015-08-01
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing.
Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J
2015-08-14
Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook
NASA Astrophysics Data System (ADS)
Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.
2012-12-01
The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
Evolutionary computing for the design search and optimization of space vehicle power subsystems
NASA Technical Reports Server (NTRS)
Kordon, Mark; Klimeck, Gerhard; Hanks, David; Hua, Hook
2004-01-01
Evolutionary computing has proven to be a straightforward and robust approach for optimizing a wide range of difficult analysis and design problems. This paper discusses the application of these techniques to an existing space vehicle power subsystem resource and performance analysis simulation in a parallel processing environment. Out preliminary results demonstrate that this approach has the potential to improve the space system trade study process by allowing engineers to statistically weight subsystem goals of mass, cost and performance then automatically size power elements based on anticipated performance of the subsystem rather than on worst-case estimates.
A GaAs vector processor based on parallel RISC microprocessors
NASA Astrophysics Data System (ADS)
Misko, Tim A.; Rasset, Terry L.
A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Peker, Musa; Şen, Baha; Gürüler, Hüseyin
2015-02-01
The effect of anesthesia on the patient is referred to as depth of anesthesia. Rapid classification of appropriate depth level of anesthesia is a matter of great importance in surgical operations. Similarly, accelerating classification algorithms is important for the rapid solution of problems in the field of biomedical signal processing. However numerous, time-consuming mathematical operations are required when training and testing stages of the classification algorithms, especially in neural networks. In this study, to accelerate the process, parallel programming and computing platform (Nvidia CUDA) facilitates dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU) was utilized. The system was employed to detect anesthetic depth level on related electroencephalogram (EEG) data set. This dataset is rather complex and large. Moreover, the achieving more anesthetic levels with rapid response is critical in anesthesia. The proposed parallelization method yielded high accurate classification results in a faster time.
Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor
2010-05-01
Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal
Operation of high power converters in parallel
NASA Technical Reports Server (NTRS)
Decker, D. K.; Inouye, L. Y.
1993-01-01
High power converters that are used in space power subsystems are limited in power handling capability due to component and thermal limitations. For applications, such as Space Station Freedom, where multi-kilowatts of power must be delivered to user loads, parallel operation of converters becomes an attractive option when considering overall power subsystem topologies. TRW developed three different unequal power sharing approaches for parallel operation of converters. These approaches, known as droop, master-slave, and proportional adjustment, are discussed and test results are presented.
Adaptive-optics optical coherence tomography processing using a graphics processing unit.
Shafer, Brandon A; Kriske, Jeffery E; Kocaoglu, Omer P; Turner, Timothy L; Liu, Zhuolin; Lee, John Jaehwan; Miller, Donald T
2014-01-01
Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability.
NASA Astrophysics Data System (ADS)
Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori
Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
Evaluation of the Intel iWarp parallel processor for space flight applications
NASA Technical Reports Server (NTRS)
Hine, Butler P., III; Fong, Terrence W.
1993-01-01
The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.
Power and Performance Trade-offs for Space Time Adaptive Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gawande, Nitin A.; Manzano Franco, Joseph B.; Tumeo, Antonino
Computational efficiency – performance relative to power or energy – is one of the most important concerns when designing RADAR processing systems. This paper analyzes power and performance trade-offs for a typical Space Time Adaptive Processing (STAP) application. We study STAP implementations for CUDA and OpenMP on two computationally efficient architectures, Intel Haswell Core I7-4770TE and NVIDIA Kayla with a GK208 GPU. We analyze the power and performance of STAP’s computationally intensive kernels across the two hardware testbeds. We also show the impact and trade-offs of GPU optimization techniques. We show that data parallelism can be exploited for efficient implementationmore » on the Haswell CPU architecture. The GPU architecture is able to process large size data sets without increase in power requirement. The use of shared memory has a significant impact on the power requirement for the GPU. A balance between the use of shared memory and main memory access leads to an improved performance in a typical STAP application.« less
J. G. Isebrands; G. E. Host; K. Lenz; G. Wu; H. W. Stech
2000-01-01
Process models are powerful research tools for assessing the effects of multiple environmental stresses on forest plantations. These models are driven by interacting environmental variables and often include genetic factors necessary for assessing forest plantation growth over a range of different site, climate, and silvicultural conditions. However, process models are...
Hybrid parallel computing architecture for multiview phase shifting
NASA Astrophysics Data System (ADS)
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Photonic reservoir computing: a new approach to optical information processing
NASA Astrophysics Data System (ADS)
Vandoorne, Kristof; Fiers, Martin; Verstraeten, David; Schrauwen, Benjamin; Dambre, Joni; Bienstman, Peter
2010-06-01
Despite ever increasing computational power, recognition and classification problems remain challenging to solve. Recently, advances have been made by the introduction of the new concept of reservoir computing. This is a methodology coming from the field of machine learning and neural networks that has been successfully used in several pattern classification problems, like speech and image recognition. Thus far, most implementations have been in software, limiting their speed and power efficiency. Photonics could be an excellent platform for a hardware implementation of this concept because of its inherent parallelism and unique nonlinear behaviour. Moreover, a photonic implementation offers the promise of massively parallel information processing with low power and high speed. We propose using a network of coupled Semiconductor Optical Amplifiers (SOA) and show in simulation that it could be used as a reservoir by comparing it to conventional software implementations using a benchmark speech recognition task. In spite of the differences with classical reservoir models, the performance of our photonic reservoir is comparable to that of conventional implementations and sometimes slightly better. As our implementation uses coherent light for information processing, we find that phase tuning is crucial to obtain high performance. In parallel we investigate the use of a network of photonic crystal cavities. The coupled mode theory (CMT) is used to investigate these resonators. A new framework is designed to model networks of resonators and SOAs. The same network topologies are used, but feedback is added to control the internal dynamics of the system. By adjusting the readout weights of the network in a controlled manner, we can generate arbitrary periodic patterns.
NASA Astrophysics Data System (ADS)
Kim, Jin Seok; Hur, Min Young; Kim, Chang Ho; Kim, Ho Jun; Lee, Hae June
2018-03-01
A two-dimensional parallelized particle-in-cell simulation has been developed to simulate a capacitively coupled plasma reactor. The parallelization using graphics processing units is applied to resolve the heavy computational load. It is found that the step-ionization plays an important role in the intermediate gas pressure of a few Torr. Without the step-ionization, the average electron density decreases while the effective electron temperature increases with the increase of gas pressure at a fixed power. With the step-ionization, however, the average electron density increases while the effective electron temperature decreases with the increase of gas pressure. The cases with the step-ionization agree well with the tendency of experimental measurement. The electron energy distribution functions show that the population of electrons having intermediate energy from 4.2 to 12 eV is relaxed by the step-ionization. Also, it was observed that the power consumption by the electrons is increasing with the increase of gas pressure by the step-ionization process, while the power consumption by the ions decreases with the increase of gas pressure.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Cloud parallel processing of tandem mass spectrometry based proteomics data.
Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus
2012-10-05
Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
High Input Voltage, Silicon Carbide Power Processing Unit Performance Demonstration
NASA Technical Reports Server (NTRS)
Bozak, Karin E.; Pinero, Luis R.; Scheidegger, Robert J.; Aulisio, Michael V.; Gonzalez, Marcelo C.; Birchenough, Arthur G.
2015-01-01
A silicon carbide brassboard power processing unit has been developed by the NASA Glenn Research Center in Cleveland, Ohio. The power processing unit operates from two sources - a nominal 300-Volt high voltage input bus and a nominal 28-Volt low voltage input bus. The design of the power processing unit includes four low voltage, low power supplies that provide power to the thruster auxiliary supplies, and two parallel 7.5 kilowatt power supplies that are capable of providing up to 15 kilowatts of total power at 300-Volts to 500-Volts to the thruster discharge supply. Additionally, the unit contains a housekeeping supply, high voltage input filter, low voltage input filter, and master control board, such that the complete brassboard unit is capable of operating a 12.5 kilowatt Hall Effect Thruster. The performance of unit was characterized under both ambient and thermal vacuum test conditions, and the results demonstrate the exceptional performance with full power efficiencies exceeding 97. With a space-qualified silicon carbide or similar high voltage, high efficiency power device, this design could evolve into a flight design for future missions that require high power electric propulsion systems.
Hierarchical Parallelization of Gene Differential Association Analysis
2011-01-01
Background Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Results Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. Conclusions The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels. PMID:21936916
Hierarchical parallelization of gene differential association analysis.
Needham, Mark; Hu, Rui; Dwarkadas, Sandhya; Qiu, Xing
2011-09-21
Microarray gene differential expression analysis is a widely used technique that deals with high dimensional data and is computationally intensive for permutation-based procedures. Microarray gene differential association analysis is even more computationally demanding and must take advantage of multicore computing technology, which is the driving force behind increasing compute power in recent years. In this paper, we present a two-layer hierarchical parallel implementation of gene differential association analysis. It takes advantage of both fine- and coarse-grain (with granularity defined by the frequency of communication) parallelism in order to effectively leverage the non-uniform nature of parallel processing available in the cutting-edge systems of today. Our results show that this hierarchical strategy matches data sharing behavior to the properties of the underlying hardware, thereby reducing the memory and bandwidth needs of the application. The resulting improved efficiency reduces computation time and allows the gene differential association analysis code to scale its execution with the number of processors. The code and biological data used in this study are downloadable from http://www.urmc.rochester.edu/biostat/people/faculty/hu.cfm. The performance sweet spot occurs when using a number of threads per MPI process that allows the working sets of the corresponding MPI processes running on the multicore to fit within the machine cache. Hence, we suggest that practitioners follow this principle in selecting the appropriate number of MPI processes and threads within each MPI process for their cluster configurations. We believe that the principles of this hierarchical approach to parallelization can be utilized in the parallelization of other computationally demanding kernels.
Rubus: A compiler for seamless and extensible parallelism.
Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism
Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor
2017-01-01
Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
A Low Power, Parallel Wearable Multi-Sensor System for Human Activity Evaluation.
Li, Yuecheng; Jia, Wenyan; Yu, Tianjian; Luan, Bo; Mao, Zhi-Hong; Zhang, Hong; Sun, Mingui
2015-04-01
In this paper, the design of a low power heterogeneous wearable multi-sensor system, built with Zynq System-on-Chip (SoC), for human activity evaluation is presented. The powerful data processing capability and flexibility of this SoC represent significant improvements over our previous ARM based system designs. The new system captures and compresses multiple color images and sensor data simultaneously. Several strategies are adopted to minimize power consumption. Our wearable system provides a new tool for the evaluation of human activity, including diet, physical activity and lifestyle.
Enhanced High Performance Power Compensation Methodology by IPFC Using PIGBT-IDVR
Arumugom, Subramanian; Rajaram, Marimuthu
2015-01-01
Currently, power systems are involuntarily controlled without high speed control and are frequently initiated, therefore resulting in a slow process when compared with static electronic devices. Among various power interruptions in power supply systems, voltage dips play a central role in causing disruption. The dynamic voltage restorer (DVR) is a process based on voltage control that compensates for line transients in the distributed system. To overcome these issues and to achieve a higher speed, a new methodology called the Parallel IGBT-Based Interline Dynamic Voltage Restorer (PIGBT-IDVR) method has been proposed, which mainly spotlights the dynamic processing of energy reloads in common dc-linked energy storage with less adaptive transition. The interline power flow controller (IPFC) scheme has been employed to manage the power transmission between the lines and the restorer method for controlling the reactive power in the individual lines. By employing the proposed methodology, the failure of a distributed system has been avoided and provides better performance than the existing methodologies. PMID:26613101
A Serial Bus Architecture for Parallel Processing Systems
1986-09-01
pins are needed to effect the data transfer. As Integrated Circuits grow in computational power, more communication capacity is needed, pushing...chip. The wider the communication path the more pins are needed to effect the data transfer. As Integrated Circuits grow in computational power, more...13 2. A Suitable Architecture Sought 14 II. OPTIMUM ARCHITECTURE OF LARGE INTEGRATED A. PARTIONING SILICON FOR MAXIMUM 1? 1. Transistor
Lammers, Joris; Stoker, Janka I; Stapel, Diederik A
2009-12-01
How does power affect behavior? We posit that this depends on the type of power. We distinguish between social power (power over other people) and personal power (freedom from other people) and argue that these two types of power have opposite associations with independence and interdependence. We propose that when the distinction between independence and interdependence is relevant, social power and personal power will have opposite effects; however, they will have parallel effects when the distinction is irrelevant. In two studies (an experimental study and a large field study), we demonstrate this by showing that social power and personal power have opposite effects on stereotyping, but parallel effects on behavioral approach.
NASA Astrophysics Data System (ADS)
Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao
2017-01-01
Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
High Input Voltage, Silicon Carbide Power Processing Unit Performance Demonstration
NASA Technical Reports Server (NTRS)
Bozak, Karin E.; Pinero, Luis R.; Scheidegger, Robert J.; Aulisio, Michael V.; Gonzalez, Marcelo C.; Birchenough, Arthur G.
2015-01-01
A silicon carbide brassboard power processing unit has been developed by the NASA Glenn Research Center in Cleveland, Ohio. The power processing unit operates from two sources: a nominal 300 Volt high voltage input bus and a nominal 28 Volt low voltage input bus. The design of the power processing unit includes four low voltage, low power auxiliary supplies, and two parallel 7.5 kilowatt (kW) discharge power supplies that are capable of providing up to 15 kilowatts of total power at 300 to 500 Volts (V) to the thruster. Additionally, the unit contains a housekeeping supply, high voltage input filter, low voltage input filter, and master control board, such that the complete brassboard unit is capable of operating a 12.5 kilowatt Hall effect thruster. The performance of the unit was characterized under both ambient and thermal vacuum test conditions, and the results demonstrate exceptional performance with full power efficiencies exceeding 97%. The unit was also tested with a 12.5kW Hall effect thruster to verify compatibility and output filter specifications. With space-qualified silicon carbide or similar high voltage, high efficiency power devices, this would provide a design solution to address the need for high power electric propulsion systems.
Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Liu, Shu-Guang; Nichols, Erin; Haga, Jim; Maddox, Brian; Bilderback, Chris; Feller, Mark; Homer, George
2001-01-01
The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost, personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting information science research into parallel computing systems and applications.
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers
NASA Astrophysics Data System (ADS)
Kimura, Keiji; Mase, Masayoshi; Mikami, Hiroki; Miyamoto, Takamichi; Shirako, Jun; Kasahara, Hironori
OSCAR (Optimally Scheduled Advanced Multiprocessor) API has been designed for real-time embedded low-power multicores to generate parallel programs for various multicores from different vendors by using the OSCAR parallelizing compiler. The OSCAR API has been developed by Waseda University in collaboration with Fujitsu Laboratory, Hitachi, NEC, Panasonic, Renesas Technology, and Toshiba in an METI/NEDO project entitled "Multicore Technology for Realtime Consumer Electronics." By using the OSCAR API as an interface between the OSCAR compiler and backend compilers, the OSCAR compiler enables hierarchical multigrain parallel processing with memory optimization under capacity restriction for cache memory, local memory, distributed shared memory, and on-chip/off-chip shared memory; data transfer using a DMA controller; and power reduction control using DVFS (Dynamic Voltage and Frequency Scaling), clock gating, and power gating for various embedded multicores. In addition, a parallelized program automatically generated by the OSCAR compiler with OSCAR API can be compiled by the ordinary OpenMP compilers since the OSCAR API is designed on a subset of the OpenMP. This paper describes the OSCAR API and its compatibility with the OSCAR compiler by showing code examples. Performance evaluations of the OSCAR compiler and the OSCAR API are carried out using an IBM Power5+ workstation, an IBM Power6 high-end SMP server, and a newly developed consumer electronics multicore chip RP2 by Renesas, Hitachi and Waseda. From the results of scalability evaluation, it is found that on an average, the OSCAR compiler with the OSCAR API can exploit 5.8 times speedup over the sequential execution on the Power5+ workstation with eight cores and 2.9 times speedup on RP2 with four cores, respectively. In addition, the OSCAR compiler can accelerate an IBM XL Fortran compiler up to 3.3 times on the Power6 SMP server. Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.
Lin, Tingyou; Ho, Yingchieh; Su, Chauchin
2017-06-15
This paper presents a method of thermal balancing for monolithic power integrated circuits (ICs). An on-chip temperature monitoring sensor that consists of a poly resistor strip in each of multiple parallel MOSFET banks is developed. A temperature-to-frequency converter (TFC) is proposed to quantize on-chip temperature. A pulse-width-modulation (PWM) methodology is developed to balance the channel temperature based on the quantization. The modulated PWM pulses control the hottest of metal-oxide-semiconductor field-effect transistor (MOSFET) bank to reduce its power dissipation and heat generation. A test chip with eight parallel MOSFET banks is fabricated in TSMC 0.25 μm HV BCD processes, and total area is 900 × 914 μm². The maximal temperature variation among the eight banks can reduce to 2.8 °C by the proposed thermal balancing system from 9.5 °C with 1.5 W dissipation. As a result, our proposed system improves the lifetime of a power MOSFET by 20%.
Lin, Tingyou; Ho, Yingchieh; Su, Chauchin
2017-01-01
This paper presents a method of thermal balancing for monolithic power integrated circuits (ICs). An on-chip temperature monitoring sensor that consists of a poly resistor strip in each of multiple parallel MOSFET banks is developed. A temperature-to-frequency converter (TFC) is proposed to quantize on-chip temperature. A pulse-width-modulation (PWM) methodology is developed to balance the channel temperature based on the quantization. The modulated PWM pulses control the hottest of metal-oxide-semiconductor field-effect transistor (MOSFET) bank to reduce its power dissipation and heat generation. A test chip with eight parallel MOSFET banks is fabricated in TSMC 0.25 μm HV BCD processes, and total area is 900 × 914 μm2. The maximal temperature variation among the eight banks can reduce to 2.8 °C by the proposed thermal balancing system from 9.5 °C with 1.5 W dissipation. As a result, our proposed system improves the lifetime of a power MOSFET by 20%. PMID:28617346
2002-01-01
wrappers to other widely used languages, namely TCL/TK, Java, and Python . VTK is very powerful and covers polygonal models and image processing classes and...follows: � Large Data Visualization and Rendering � Information Visualization for Beginners � Rendering and Visualization in Parallel Environments
Design of neurophysiologically motivated structures of time-pulse coded neurons
NASA Astrophysics Data System (ADS)
Krasilenko, Vladimir G.; Nikolsky, Alexander I.; Lazarev, Alexander A.; Lobodzinska, Raisa F.
2009-04-01
The common methodology of biologically motivated concept of building of processing sensors systems with parallel input and picture operands processing and time-pulse coding are described in paper. Advantages of such coding for creation of parallel programmed 2D-array structures for the next generation digital computers which require untraditional numerical systems for processing of analog, digital, hybrid and neuro-fuzzy operands are shown. The optoelectronic time-pulse coded intelligent neural elements (OETPCINE) simulation results and implementation results of a wide set of neuro-fuzzy logic operations are considered. The simulation results confirm engineering advantages, intellectuality, circuit flexibility of OETPCINE for creation of advanced 2D-structures. The developed equivalentor-nonequivalentor neural element has power consumption of 10mW and processing time about 10...100us.
Parallel algorithm for computation of second-order sequential best rotations
NASA Astrophysics Data System (ADS)
Redif, Soydan; Kasap, Server
2013-12-01
Algorithms for computing an approximate polynomial matrix eigenvalue decomposition of para-Hermitian systems have emerged as a powerful, generic signal processing tool. A technique that has shown much success in this regard is the sequential best rotation (SBR2) algorithm. Proposed is a scheme for parallelising SBR2 with a view to exploiting the modern architectural features and inherent parallelism of field-programmable gate array (FPGA) technology. Experiments show that the proposed scheme can achieve low execution times while requiring minimal FPGA resources.
Increased Energy Delivery for Parallel Battery Packs with No Regulated Bus
NASA Astrophysics Data System (ADS)
Hsu, Chung-Ti
In this dissertation, a new approach to paralleling different battery types is presented. A method for controlling charging/discharging of different battery packs by using low-cost bi-directional switches instead of DC-DC converters is proposed. The proposed system architecture, algorithms, and control techniques allow batteries with different chemistry, voltage, and SOC to be properly charged and discharged in parallel without causing safety problems. The physical design and cost for the energy management system is substantially reduced. Additionally, specific types of failures in the maximum power point tracking (MPPT) in a photovoltaic (PV) system when tracking only the load current of a DC-DC converter are analyzed. The periodic nonlinear load current will lead MPPT realized by the conventional perturb and observe (P&O) algorithm to be problematic. A modified MPPT algorithm is proposed and it still only requires typically measured signals, yet is suitable for both linear and periodic nonlinear loads. Moreover, for a modular DC-DC converter using several converters in parallel, the input power from PV panels is processed and distributed at the module level. Methods for properly implementing distributed MPPT are studied. A new approach to efficient MPPT under partial shading conditions is presented. The power stage architecture achieves fast input current change rate by combining a current-adjustable converter with a few converters operating at a constant current.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
Parallel dispatch: a new paradigm of electrical power system dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
Continuous-time ΣΔ ADC with implicit variable gain amplifier for CMOS image sensor.
Tang, Fang; Bermak, Amine; Abbes, Amira; Benammar, Mohieddine Amor
2014-01-01
This paper presents a column-parallel continuous-time sigma delta (CTSD) ADC for mega-pixel resolution CMOS image sensor (CIS). The sigma delta modulator is implemented with a 2nd order resistor/capacitor-based loop filter. The first integrator uses a conventional operational transconductance amplifier (OTA), for the concern of a high power noise rejection. The second integrator is realized with a single-ended inverter-based amplifier, instead of a standard OTA. As a result, the power consumption is reduced, without sacrificing the noise performance. Moreover, the variable gain amplifier in the traditional column-parallel read-out circuit is merged into the front-end of the CTSD modulator. By programming the input resistance, the amplitude range of the input current can be tuned with 8 scales, which is equivalent to a traditional 2-bit preamplification function without consuming extra power and chip area. The test chip prototype is fabricated using 0.18 μm CMOS process and the measurement result shows an ADC power consumption lower than 63.5 μW under 1.4 V power supply and 50 MHz clock frequency.
High Rate Digital Demodulator ASIC
NASA Technical Reports Server (NTRS)
Ghuman, Parminder; Sheikh, Salman; Koubek, Steve; Hoy, Scott; Gray, Andrew
1998-01-01
The architecture of High Rate (600 Mega-bits per second) Digital Demodulator (HRDD) ASIC capable of demodulating BPSK and QPSK modulated data is presented in this paper. The advantages of all-digital processing include increased flexibility and reliability with reduced reproduction costs. Conventional serial digital processing would require high processing rates necessitating a hardware implementation in other than CMOS technology such as Gallium Arsenide (GaAs) which has high cost and power requirements. It is more desirable to use CMOS technology with its lower power requirements and higher gate density. However, digital demodulation of high data rates in CMOS requires parallel algorithms to process the sampled data at a rate lower than the data rate. The parallel processing algorithms described here were developed jointly by NASA's Goddard Space Flight Center (GSFC) and the Jet Propulsion Laboratory (JPL). The resulting all-digital receiver has the capability to demodulate BPSK, QPSK, OQPSK, and DQPSK at data rates in excess of 300 Mega-bits per second (Mbps) per channel. This paper will provide an overview of the parallel architecture and features of the HRDR ASIC. In addition, this paper will provide an over-view of the implementation of the hardware architectures used to create flexibility over conventional high rate analog or hybrid receivers. This flexibility includes a wide range of data rates, modulation schemes, and operating environments. In conclusion it will be shown how this high rate digital demodulator can be used with an off-the-shelf A/D and a flexible analog front end, both of which are numerically computer controlled, to produce a very flexible, low cost high rate digital receiver.
Anisotropic Behaviour of Magnetic Power Spectra in Solar Wind Turbulence.
NASA Astrophysics Data System (ADS)
Banerjee, S.; Saur, J.; Gerick, F.; von Papen, M.
2017-12-01
Introduction:High altitude fast solar wind turbulence (SWT) shows different spectral properties as a function of the angle between the flow direction and the scale dependent mean magnetic field (Horbury et al., PRL, 2008). The average magnetic power contained in the near perpendicular direction (80º-90º) was found to be approximately 5 times larger than the average power in the parallel direction (0º- 10º). In addition, the parallel power spectra was found to give a steeper (-2) power law than the perpendicular power spectral density (PSD) which followed a near Kolmogorov slope (-5/3). Similar anisotropic behaviour has also been observed (Chen et al., MNRAS, 2011) for slow solar wind (SSW), but using a different method exploiting multi-spacecraft data of Cluster. Purpose:In the current study, using Ulysses data, we investigate (i) the anisotropic behaviour of near ecliptic slow solar wind using the same methodology (described below) as that of Horbury et al. (2008) and (ii) the dependence of the anisotropic behaviour of SWT as a function of the heliospheric latitude.Method:We apply the wavelet method to calculate the turbulent power spectra of the magnetic field fluctuations parallel and perpendicular to the local mean magnetic field (LMF). According to Horbury et al., LMF for a given scale (or size) is obtained using an envelope of the envelope of that size. Results:(i) SSW intervals always show near -5/3 perpendicular spectra. Unlike the fast solar wind (FSW) intervals, for SSW, we often find intervals where power parallel to the mean field is not observed. For a few intervals with sufficient power in parallel direction, slow wind turbulence also exhibit -2 parallel spectra similar to FSW.(ii) The behaviours of parallel and perpendicular power spectra are found to be independent of the heliospheric latitude. Conclusion:In the current study we do not find significant influence of the heliospheric latitude on the spectral slopes of parallel and perpendicular magnetic spectra. This indicates that the spectral anisotropy in parallel and perpendicular direction is governed by intrinsic properties of SWT.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.
Catarinucci, Luca; Tarricone, Luciano
2009-01-01
The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
Speculation and replication in temperature accelerated dynamics
Zamora, Richard J.; Perez, Danny; Voter, Arthur F.
2018-02-12
Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less
Speculation and replication in temperature accelerated dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Richard J.; Perez, Danny; Voter, Arthur F.
Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less
Framework for Parallel Preprocessing of Microarray Data Using Hadoop
2018-01-01
Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018
System design of ELITE power processing unit
NASA Astrophysics Data System (ADS)
Caldwell, David J.
The Electric Propulsion Insertion Transfer Experiment (ELITE) is a space mission planned for the mid 1990s in which technological readiness will be demonstrated for electric orbit transfer vehicles (EOTVs). A system-level design of the power processing unit (PPU), which conditions solar array power for the arcjet thruster, was performed to optimize performance with respect to reliability, power output, efficiency, specific mass, and radiation hardness. The PPU system consists of multiphased parallel switchmode converters, configured as current sources, connected directly from the array to the thruster. The PPU control system includes a solar array peak power tracker (PPT) to maximize the power delivered to the thruster regardless of variations in array characteristics. A stability analysis has been performed to verify that the system is stable despite the nonlinear negative impedance of the PPU input and the arcjet thruster. Performance specifications are given to provide the required spacecraft capability with existing technology.
A fast ultrasonic simulation tool based on massively parallel implementations
NASA Astrophysics Data System (ADS)
Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain
2014-02-01
This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Hardware Algorithm Implementation for Mission Specific Processing
2008-03-01
knowledge about the VLSI technology and understands VHDL, scripting, and intergrating the script in Cadencersoftware pro- gram or Modelsimr. The main...possible to have a trade off between parallel and serial logic design for the circuit. Power can be saved by using parallization, pipelining, or a
Schmideder, Andreas; Cremer, Johannes H; Weuster-Botz, Dirk
2016-11-01
In general, fed-batch processes are applied for recombinant protein production with Escherichia coli (E. coli). However, state of the art methods for identifying suitable reaction conditions suffer from severe drawbacks, i.e. direct transfer of process information from parallel batch studies is often defective and sequential fed-batch studies are time-consuming and cost-intensive. In this study, continuously operated stirred-tank reactors on a milliliter scale were applied to identify suitable reaction conditions for fed-batch processes. Isopropyl β-d-1-thiogalactopyranoside (IPTG) induction strategies were varied in parallel-operated stirred-tank bioreactors to study the effects on the continuous production of the recombinant protein photoactivatable mCherry (PAmCherry) with E. coli. Best-performing induction strategies were transferred from the continuous processes on a milliliter scale to liter scale fed-batch processes. Inducing recombinant protein expression by dynamically increasing the IPTG concentration to 100 µM led to an increase in the product concentration of 21% (8.4 g L -1 ) compared to an implemented high-performance production process with the most frequently applied induction strategy by a single addition of 1000 µM IPGT. Thus, identifying feasible reaction conditions for fed-batch processes in parallel continuous studies on a milliliter scale was shown to be a powerful, novel method to accelerate bioprocess design in a cost-reducing manner. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 32:1426-1435, 2016. © 2016 American Institute of Chemical Engineers.
Li, Kangkang; Yu, Hai; Feron, Paul; Tade, Moses; Wardhaugh, Leigh
2015-08-18
Using a rate-based model, we assessed the technical feasibility and energy performance of an advanced aqueous-ammonia-based postcombustion capture process integrated with a coal-fired power station. The capture process consists of three identical process trains in parallel, each containing a CO2 capture unit, an NH3 recycling unit, a water separation unit, and a CO2 compressor. A sensitivity study of important parameters, such as NH3 concentration, lean CO2 loading, and stripper pressure, was performed to minimize the energy consumption involved in the CO2 capture process. Process modifications of the rich-split process and the interheating process were investigated to further reduce the solvent regeneration energy. The integrated capture system was then evaluated in terms of the mass balance and the energy consumption of each unit. The results show that our advanced ammonia process is technically feasible and energy-competitive, with a low net power-plant efficiency penalty of 7.7%.
Design of an Input-Parallel Output-Parallel LLC Resonant DC-DC Converter System for DC Microgrids
NASA Astrophysics Data System (ADS)
Juan, Y. L.; Chen, T. R.; Chang, H. M.; Wei, S. E.
2017-11-01
Compared with the centralized power system, the distributed modularized power system is composed of several power modules with lower power capacity to provide a totally enough power capacity for the load demand. Therefore, the current stress of the power components in each module can then be reduced, and the flexibility of system setup is also enhanced. However, the parallel-connected power modules in the conventional system are usually controlled to equally share the power flow which would result in lower efficiency in low loading condition. In this study, a modular power conversion system for DC micro grid is developed with 48 V dc low voltage input and 380 V dc high voltage output. However, in the developed system control strategy, the numbers of power modules enabled to share the power flow is decided according to the output power at lower load demand. Finally, three 350 W power modules are constructed and parallel-connected to setup a modular power conversion system. From the experimental results, compared with the conventional system, the efficiency of the developed power system in the light loading condition is greatly improved. The modularized design of the power system can also decrease the power loss ratio to the system capacity.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager
Kwiatkowski, Kris; Lyke, James
2007-12-18
Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samedov, V. V., E-mail: v-samedov@yandex.ru
2016-12-15
In this study, we theoretically analyze the processes in a plane-parallel high-purity germanium (HPGe) detector. The generating function of factorial moments describing the process of registration of low-energy X-rays by the HPGe detector with consideration of capture of charge carriers by traps is obtained. It is demonstrated that the coefficients of expansion of the average signal amplitude and variance in power series over the quantity inversely proportional to the bias voltage of the detector allow one to determine the Fano factor, the product of the charge carrier lifetime and mobility, and other characteristics of the semiconductor material of the detector.
Low profile, highly configurable, current sharing paralleled wide band gap power device power module
McPherson, Brice; Killeen, Peter D.; Lostetter, Alex; Shaw, Robert; Passmore, Brandon; Hornberger, Jared; Berry, Tony M
2016-08-23
A power module with multiple equalized parallel power paths supporting multiple parallel bare die power devices constructed with low inductance equalized current paths for even current sharing and clean switching events. Wide low profile power contacts provide low inductance, short current paths, and large conductor cross section area provides for massive current carrying. An internal gate & source kelvin interconnection substrate is provided with individual ballast resistors and simple bolted construction. Gate drive connectors are provided on either left or right size of the module. The module is configurable as half bridge, full bridge, common source, and common drain topologies.
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Reconfigurable Computing for Computational Science: A New Focus in High Performance Computing
2006-11-01
in the past decade. Researchers are regularly employing the power of large computing systems and parallel processing to tackle larger and more...complex problems in all of the physical sciences. For the past decade or so, most of this growth in computing power has been “free” with increased...the scientific computing community as a means to continued growth in computing capability. This paper offers a glimpse of the hardware and
Lightweight High Efficiency Electric Motors for Space Applications
NASA Technical Reports Server (NTRS)
Robertson, Glen A.; Tyler, Tony R.; Piper, P. J.
2011-01-01
Lightweight high efficiency electric motors are needed across a wide range of space applications from - thrust vector actuator control for launch and flight applications to - general vehicle, base camp habitat and experiment control for various mechanisms to - robotics for various stationary and mobile space exploration missions. QM Power?s Parallel Path Magnetic Technology Motors have slowly proven themselves to be a leading motor technology in this area; winning a NASA Phase II for "Lightweight High Efficiency Electric Motors and Actuators for Low Temperature Mobility and Robotics Applications" a US Army Phase II SBIR for "Improved Robot Actuator Motors for Medical Applications", an NSF Phase II SBIR for "Novel Low-Cost Electric Motors for Variable Speed Applications" and a DOE SBIR Phase I for "High Efficiency Commercial Refrigeration Motors" Parallel Path Magnetic Technology obtains the benefits of using permanent magnets while minimizing the historical trade-offs/limitations found in conventional permanent magnet designs. The resulting devices are smaller, lower weight, lower cost and have higher efficiency than competitive permanent magnet and non-permanent magnet designs. QM Power?s motors have been extensively tested and successfully validated by multiple commercial and aerospace customers and partners as Boeing Research and Technology. Prototypes have been made between 0.1 and 10 HP. They are also in the process of scaling motors to over 100kW with their development partners. In this paper, Parallel Path Magnetic Technology Motors will be discussed; specifically addressing their higher efficiency, higher power density, lighter weight, smaller physical size, higher low end torque, wider power zone, cooler temperatures, and greater reliability with lower cost and significant environment benefit for the same peak output power compared to typically motors. A further discussion on the inherent redundancy of these motors for space applications will be provided.
Creating a Parallel Version of VisIt for Microsoft Windows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whitlock, B J; Biagas, K S; Rawson, P L
2011-12-07
VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Generating unstructured nuclear reactor core meshes in parallel
Jain, Rajeev; Tautges, Timothy J.
2014-10-24
Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
Integration of Modelling and Graphics to Create an Infrared Signal Processing Test Bed
NASA Astrophysics Data System (ADS)
Sethi, H. R.; Ralph, John E.
1989-03-01
The work reported in this paper was carried out as part of a contract with MoD (PE) UK. It considers the problems associated with realistic modelling of a passive infrared system in an operational environment. Ideally all aspects of the system and environment should be integrated into a complete end-to-end simulation but in the past limited computing power has prevented this. Recent developments in workstation technology and the increasing availability of parallel processing techniques makes the end-to-end simulation possible. However the complexity and speed of such simulations means difficulties for the operator in controlling the software and understanding the results. These difficulties can be greatly reduced by providing an extremely user friendly interface and a very flexible, high power, high resolution colour graphics capability. Most system modelling is based on separate software simulation of the individual components of the system itself and its environment. These component models may have their own characteristic inbuilt assumptions and approximations, may be written in the language favoured by the originator and may have a wide variety of input and output conventions and requirements. The models and their limitations need to be matched to the range of conditions appropriate to the operational scenerio. A comprehensive set of data bases needs to be generated by the component models and these data bases must be made readily available to the investigator. Performance measures need to be defined and displayed in some convenient graphics form. Some options are presented for combining available hardware and software to create an environment within which the models can be integrated, and which provide the required man-machine interface, graphics and computing power. The impact of massively parallel processing and artificial intelligence will be discussed. Parallel processing will make real time end-to-end simulation possible and will greatly improve the graphical visualisation of the model output data. Artificial intelligence should help to enhance the man-machine interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Volkov, M V; Garanin, S G; Dolgopolov, Yu V
2014-11-30
A seven-channel fibre laser system operated by the master oscillator – multichannel power amplifier scheme is the phase locked using a stochastic parallel gradient algorithm. The phase modulators on lithium niobate crystals are controlled by a multichannel electronic unit with the microcontroller processing signals in real time. The dynamic phase locking of the laser system with the bandwidth of 14 kHz is demonstrated, the time of phasing is 3 – 4 ms. (fibre and integrated-optical structures)
Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Caubet, Jordi; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user interface for inspection and analyses of the generated traces. We describe how to use the system to conduct a detailed comparative study of a benchmark code implemented in five different programming paradigms applicable for shared memory
Multiple resonant railgun power supply
Honig, E.M.; Nunnally, W.C.
1985-06-19
A multiple repetitive resonant railgun power supply provides energy for repetitively propelling projectiles from a pair of parallel rails. A plurality of serially connected paired parallel rails are powered by similar power supplies. Each supply comprises an energy storage capacitor, a storage inductor to form a resonant circuit with the energy storage capacitor and a magnetic switch to transfer energy between the resonant circuit and the pair of parallel rails for the propelling of projectiles. The multiple serial operation permits relatively small energy components to deliver overall relatively large amounts of energy to the projectiles being propelled.
Multiple resonant railgun power supply
Honig, Emanuel M.; Nunnally, William C.
1988-01-01
A multiple repetitive resonant railgun power supply provides energy for repetitively propelling projectiles from a pair of parallel rails. A plurality of serially connected paired parallel rails are powered by similar power supplies. Each supply comprises an energy storage capacitor, a storage inductor to form a resonant circuit with the energy storage capacitor and a magnetic switch to transfer energy between the resonant circuit and the pair of parallel rails for the propelling of projectiles. The multiple serial operation permits relatively small energy components to deliver overall relatively large amounts of energy to the projectiles being propelled.
AC losses in horizontally parallel HTS tapes for possible wireless power transfer applications
NASA Astrophysics Data System (ADS)
Shen, Boyang; Geng, Jianzhao; Zhang, Xiuchang; Fu, Lin; Li, Chao; Zhang, Heng; Dong, Qihuan; Ma, Jun; Gawith, James; Coombs, T. A.
2017-12-01
This paper presents the concept of using horizontally parallel HTS tapes with AC loss study, and the investigation on possible wireless power transfer (WPT) applications. An example of three parallel HTS tapes was proposed, whose AC loss study was carried out both from experiment using electrical method; and simulation using 2D H-formulation on the FEM platform of COMSOL Multiphysics. The electromagnetic induction around the three parallel tapes was monitored using COMSOL simulation. The electromagnetic induction and AC losses generated by a conventional three turn coil was simulated as well, and then compared to the case of three parallel tapes with the same AC transport current. The analysis demonstrates that HTS parallel tapes could be potentially used into wireless power transfer systems, which could have lower total AC losses than conventional HTS coils.
A transient FETI methodology for large-scale parallel implicit computations in structural mechanics
NASA Technical Reports Server (NTRS)
Farhat, Charbel; Crivelli, Luis; Roux, Francois-Xavier
1992-01-01
Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because explicit schemes are also easier to parallelize than implicit ones. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet -- and perhaps will never -- be offset by the speed of parallel hardware. Therefore, it is essential to develop efficient and robust alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating low-frequency dynamics. Here we present a domain decomposition method for implicit schemes that requires significantly less storage than factorization algorithms, that is several times faster than other popular direct and iterative methods, that can be easily implemented on both shared and local memory parallel processors, and that is both computationally and communication-wise efficient. The proposed transient domain decomposition method is an extension of the method of Finite Element Tearing and Interconnecting (FETI) developed by Farhat and Roux for the solution of static problems. Serial and parallel performance results on the CRAY Y-MP/8 and the iPSC-860/128 systems are reported and analyzed for realistic structural dynamics problems. These results establish the superiority of the FETI method over both the serial/parallel conjugate gradient algorithm with diagonal scaling and the serial/parallel direct method, and contrast the computational power of the iPSC-860/128 parallel processor with that of the CRAY Y-MP/8 system.
Design of a Modular 5-kW Power Processing Unit for the Next-Generation 40-cm Ion Engine
NASA Technical Reports Server (NTRS)
Pinero, Luis R.; Bond, Thomas; Okada, Don; Pyter, Janusz; Wiseman, Steve
2002-01-01
NASA Glenn Research Center is developing a 5/10-kW ion engine for a broad range of mission applications. Simultaneously, a 5-kW breadboard poster processing unit is being designed and fabricated. The design includes a beam supply consisting of four 1.1 kW power modules connected in parallel, equally sharing the output current. A novel phase-shifted/pulse-width-modulated dual full-bridge topology was chosen for its soft-switching characteristics. The proposed modular approach allows scalability to higher powers as well as the possibility of implementing an N+1 redundant beam supply. Efficiencies in excess of 96% were measured during testing of a breadboard beam power module. A specific mass of 3.0 kg/kW is expected for a flight PRO. This represents a 50% reduction from the state of the art NSTAR power processor.
Chikkagoudar, Satish; Wang, Kai; Li, Mingyao
2011-05-26
Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
2011-01-01
Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
NASA Astrophysics Data System (ADS)
Abramov, E. Y.; Sopov, V. I.
2017-10-01
In a given research using the example of traction network area with high asymmetry of power supply parameters, the sequence of comparative assessment of power losses in DC traction network with parallel and traditional separated operating modes of traction substation feeders was shown. Experimental measurements were carried out under these modes of operation. The calculation data results based on statistic processing showed the power losses decrease in contact network and the increase in feeders. The changes proved to be critical ones and this demonstrates the significance of potential effects when converting traction network areas into parallel feeder operation. An analytical method of calculation the average power losses for different feed schemes of the traction network was developed. On its basis, the dependences of the relative losses were obtained by varying the difference in feeder voltages. The calculation results showed unreasonableness transition to a two-sided feed scheme for the considered traction network area. A larger reduction in the total power loss can be obtained with a smaller difference of the feeders’ resistance and / or a more symmetrical sectioning scheme of contact network.
Processing MPI Datatypes Outside MPI
NASA Astrophysics Data System (ADS)
Ross, Robert; Latham, Robert; Gropp, William; Lusk, Ewing; Thakur, Rajeev
The MPI datatype functionality provides a powerful tool for describing structured memory and file regions in parallel applications, enabling noncontiguous data to be operated on by MPI communication and I/O routines. However, no facilities are provided by the MPI standard to allow users to efficiently manipulate MPI datatypes in their own codes.
(abstract) A High Throughput 3-D Inner Product Processor
NASA Technical Reports Server (NTRS)
Daud, Tuan
1996-01-01
A particularily challenging image processing application is the real time scene acquisition and object discrimination. It requires spatio-temporal recognition of point and resolved objects at high speeds with parallel processing algorithms. Neural network paradigms provide fine grain parallism and, when implemented in hardware, offer orders of magnitude speed up. However, neural networks implemented on a VLSI chip are planer architectures capable of efficient processing of linear vector signals rather than 2-D images. Therefore, for processing of images, a 3-D stack of neural-net ICs receiving planar inputs and consuming minimal power are required. Details of the circuits with chip architectures will be described with need to develop ultralow-power electronics. Further, use of the architecture in a system for high-speed processing will be illustrated.
Use of parallel computing for analyzing big data in EEG studies of ambiguous perception
NASA Astrophysics Data System (ADS)
Maksimenko, Vladimir A.; Grubov, Vadim V.; Kirsanov, Daniil V.
2018-02-01
Problem of interaction between human and machine systems through the neuro-interfaces (or brain-computer interfaces) is an urgent task which requires analysis of large amount of neurophysiological EEG data. In present paper we consider the methods of parallel computing as one of the most powerful tools for processing experimental data in real-time with respect to multichannel structure of EEG. In this context we demonstrate the application of parallel computing for the estimation of the spectral properties of multichannel EEG signals, associated with the visual perception. Using CUDA C library we run wavelet-based algorithm on GPUs and show possibility for detection of specific patterns in multichannel set of EEG data in real-time.
Parallel processing of embossing dies with ultrafast lasers
NASA Astrophysics Data System (ADS)
Jarczynski, Manfred; Mitra, Thomas; Brüning, Stephan; Du, Keming; Jenke, Gerald
2018-02-01
Functionalization of surfaces equips products and components with new features like hydrophilic behavior, adjustable gloss level, light management properties, etc. Small feature sizes demand diffraction-limited spots and adapted fluence for different materials. Through the availability of high power fast repeating ultrashort pulsed lasers and efficient optical processing heads delivering diffraction-limited small spot size of around 10μm it is feasible to achieve fluences higher than an adequate patterning requires. Hence, parallel processing is becoming of interest to increase the throughput and allow mass production of micro machined surfaces. The first step on the roadmap of parallel processing for cylinder embossing dies was realized with an eight- spot processing head based on ns-fiber laser with passive optical beam splitting, individual spot switching by acousto optical modulation and an advanced imaging. Patterning of cylindrical embossing dies shows a high efficiency of nearby 80%, diffraction-limited and equally spaced spots with pitches down to 25μm achieved by a compression using cascaded prism arrays. Due to the nanoseconds laser pulses the ablation shows the typical surrounding material deposition of a hot process. In the next step the processing head was adapted to a picosecond-laser source and the 500W fiber laser was replaced by an ultrashort pulsed laser with 300W, 12ps and a repetition frequency of up to 6MHz. This paper presents details about the processing head design and the analysis of ablation rates and patterns on steel, copper and brass dies. Furthermore, it gives an outlook on scaling the parallel processing head from eight to 16 individually switched beamlets to increase processing throughput and optimized utilization of the available ultrashort pulsed laser energy.
Fuel cells provide a revenue-generating solution to power quality problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
King, J.M. Jr.
Electric power quality and reliability are becoming increasingly important as computers and microprocessors assume a larger role in commercial, health care and industrial buildings and processes. At the same time, constraints on transmission and distribution of power from central stations are making local areas vulnerable to low voltage, load addition limitations, power quality and power reliability problems. Many customers currently utilize some form of premium power in the form of standby generators and/or UPS systems. These include customers where continuous power is required because of health and safety or security reasons (hospitals, nursing homes, places of public assembly, air trafficmore » control, military installations, telecommunications, etc.) These also include customers with industrial or commercial processes which can`t tolerance an interruption of power because of product loss or equipment damage. The paper discusses the use of the PC25 fuel cell power plant for backup and parallel power supplies for critical industrial applications. Several PC25 installations are described: the use of propane in a PC25; the use by rural cooperatives; and a demonstration of PC25 technology using landfill gas.« less
Laser Powered Launch Vehicle Performance Analyses
NASA Technical Reports Server (NTRS)
Chen, Yen-Sen; Liu, Jiwen; Wang, Ten-See (Technical Monitor)
2001-01-01
The purpose of this study is to establish the technical ground for modeling the physics of laser powered pulse detonation phenomenon. Laser powered propulsion systems involve complex fluid dynamics, thermodynamics and radiative transfer processes. Successful predictions of the performance of laser powered launch vehicle concepts depend on the sophisticate models that reflects the underlying flow physics including the laser ray tracing the focusing, inverse Bremsstrahlung (IB) effects, finite-rate air chemistry, thermal non-equilibrium, plasma radiation and detonation wave propagation, etc. The proposed work will extend the base-line numerical model to an efficient design analysis tool. The proposed model is suitable for 3-D analysis using parallel computing methods.
Multicoil resonance-based parallel array for smart wireless power delivery.
Mirbozorgi, S A; Sawan, M; Gosselin, B
2013-01-01
This paper presents a novel resonance-based multicoil structure as a smart power surface to wirelessly power up apparatus like mobile, animal headstage, implanted devices, etc. The proposed powering system is based on a 4-coil resonance-based inductive link, the resonance coil of which is formed by an array of several paralleled coils as a smart power transmitter. The power transmitter employs simple circuit connections and includes only one power driver circuit per multicoil resonance-based array, which enables higher power transfer efficiency and power delivery to the load. The power transmitted by the driver circuit is proportional to the load seen by the individual coil in the array. Thus, the transmitted power scales with respect to the load of the electric/electronic system to power up, and does not divide equally over every parallel coils that form the array. Instead, only the loaded coils of the parallel array transmit significant part of total transmitted power to the receiver. Such adaptive behavior enables superior power, size and cost efficiency then other solutions since it does not need to use complex detection circuitry to find the location of the load. The performance of the proposed structure is verified by measurement results. Natural load detection and covering 4 times bigger area than conventional topologies with a power transfer efficiency of 55% are the novelties of presented paper.
Execution of a parallel edge-based Navier-Stokes solver on commodity graphics processor units
NASA Astrophysics Data System (ADS)
Corral, Roque; Gisbert, Fernando; Pueblas, Jesus
2017-02-01
The implementation of an edge-based three-dimensional Reynolds Average Navier-Stokes solver for unstructured grids able to run on multiple graphics processing units (GPUs) is presented. Loops over edges, which are the most time-consuming part of the solver, have been written to exploit the massively parallel capabilities of GPUs. Non-blocking communications between parallel processes and between the GPU and the central processor unit (CPU) have been used to enhance code scalability. The code is written using a mixture of C++ and OpenCL, to allow the execution of the source code on GPUs. The Message Passage Interface (MPI) library is used to allow the parallel execution of the solver on multiple GPUs. A comparative study of the solver parallel performance is carried out using a cluster of CPUs and another of GPUs. It is shown that a single GPU is up to 64 times faster than a single CPU core. The parallel scalability of the solver is mainly degraded due to the loss of computing efficiency of the GPU when the size of the case decreases. However, for large enough grid sizes, the scalability is strongly improved. A cluster featuring commodity GPUs and a high bandwidth network is ten times less costly and consumes 33% less energy than a CPU-based cluster with an equivalent computational power.
NASA Technical Reports Server (NTRS)
2003-01-01
Topics covered include: Stable, Thermally Conductive Fillers for Bolted Joints; Connecting to Thermocouples with Fewer Lead Wires; Zipper Connectors for Flexible Electronic Circuits; Safety Interlock for Angularly Misdirected Power Tool; Modular, Parallel Pulse-Shaping Filter Architectures; High-Fidelity Piezoelectric Audio Device; Photovoltaic Power Station with Ultracapacitors for Storage; Time Analyzer for Time Synchronization and Monitor of the Deep Space Network; Program for Computing Albedo; Integrated Software for Analyzing Designs of Launch Vehicles; Abstract-Reasoning Software for Coordinating Multiple Agents; Software Searches for Better Spacecraft-Navigation Models; Software for Partly Automated Recognition of Targets; Antistatic Polycarbonate/Copper Oxide Composite; Better VPS Fabrication of Crucibles and Furnace Cartridges; Burn-Resistant, Strong Metal-Matrix Composites; Self-Deployable Spring-Strip Booms; Explosion Welding for Hermetic Containerization; Improved Process for Fabricating Carbon Nanotube Probes; Automated Serial Sectioning for 3D Reconstruction; and Parallel Subconvolution Filtering Architectures.
Eidels, Ami; Houpt, Joseph W.; Altieri, Nicholas; Pei, Lei; Townsend, James T.
2011-01-01
Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system. PMID:21516183
Eidels, Ami; Houpt, Joseph W; Altieri, Nicholas; Pei, Lei; Townsend, James T
2011-04-01
Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system.
A Stochastic Spiking Neural Network for Virtual Screening.
Morro, A; Canals, V; Oliver, A; Alomar, M L; Galan-Prado, F; Ballester, P J; Rossello, J L
2018-04-01
Virtual screening (VS) has become a key computational tool in early drug design and screening performance is of high relevance due to the large volume of data that must be processed to identify molecules with the sought activity-related pattern. At the same time, the hardware implementations of spiking neural networks (SNNs) arise as an emerging computing technique that can be applied to parallelize processes that normally present a high cost in terms of computing time and power. Consequently, SNN represents an attractive alternative to perform time-consuming processing tasks, such as VS. In this brief, we present a smart stochastic spiking neural architecture that implements the ultrafast shape recognition (USR) algorithm achieving two order of magnitude of speed improvement with respect to USR software implementations. The neural system is implemented in hardware using field-programmable gate arrays allowing a highly parallelized USR implementation. The results show that, due to the high parallelization of the system, millions of compounds can be checked in reasonable times. From these results, we can state that the proposed architecture arises as a feasible methodology to efficiently enhance time-consuming data-mining processes such as 3-D molecular similarity search.
MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control
NASA Astrophysics Data System (ADS)
Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming
2017-09-01
Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.
In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
Accelerating large-scale protein structure alignments with graphics processing units
2012-01-01
Background Large-scale protein structure alignment, an indispensable tool to structural bioinformatics, poses a tremendous challenge on computational resources. To ensure structure alignment accuracy and efficiency, efforts have been made to parallelize traditional alignment algorithms in grid environments. However, these solutions are costly and of limited accessibility. Others trade alignment quality for speedup by using high-level characteristics of structure fragments for structure comparisons. Findings We present ppsAlign, a parallel protein structure Alignment framework designed and optimized to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, ppsAlign could take many concurrent methods, such as TM-align and Fr-TM-align, into the parallelized algorithm design. We evaluated ppsAlign on an NVIDIA Tesla C2050 GPU card, and compared it with existing software solutions running on an AMD dual-core CPU. We observed a 36-fold speedup over TM-align, a 65-fold speedup over Fr-TM-align, and a 40-fold speedup over MAMMOTH. Conclusions ppsAlign is a high-performance protein structure alignment tool designed to tackle the computational complexity issues from protein structural data. The solution presented in this paper allows large-scale structure comparisons to be performed using massive parallel computing power of GPU. PMID:22357132
O'Sullivan, G.A.; O'Sullivan, J.A.
1999-07-27
In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.
O'Sullivan, George A.; O'Sullivan, Joseph A.
1999-01-01
In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
System-wide power management control via clock distribution network
Coteus, Paul W.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Reed, Don D.
2015-05-19
An apparatus, method and computer program product for automatically controlling power dissipation of a parallel computing system that includes a plurality of processors. A computing device issues a command to the parallel computing system. A clock pulse-width modulator encodes the command in a system clock signal to be distributed to the plurality of processors. The plurality of processors in the parallel computing system receive the system clock signal including the encoded command, and adjusts power dissipation according to the encoded command.
Oahu wind power survey, first report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramage, C.S.; Daniels, P.A.; Schroeder, T.A.
1977-05-01
A wind power survey has been conducted on Oahu since summer 1975. At seventeen potentially windy sites, calibrated anemometers and wind vanes were installed and recordings made on computer-processable magnetic tape cassettes. From monthly mean wind speeds--normalized by comparing with Honolulu Airport means winds--it was concluded that about 23 mi/hr represented the highest average annual wind speed likely to be attained on Oahu and that the Koko Head and Kahuku areas gave the most promise for wind energy generation. Diurnal variation of the wind in these areas roughly parallels diurnal variation of electric power demand.
Module Six: Parallel Circuits; Basic Electricity and Electronics Individualized Learning System.
ERIC Educational Resources Information Center
Bureau of Naval Personnel, Washington, DC.
In this module the student will learn the rules that govern the characteristics of parallel circuits; the relationships between voltage, current, resistance and power; and the results of common troubles in parallel circuits. The module is divided into four lessons: rules of voltage and current, rules for resistance and power, variational analysis,…
Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu
2018-04-20
A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
NASA Astrophysics Data System (ADS)
Kodama, Yu; Hamagami, Tomoki
Distributed processing system for restoration of electric power distribution network using two-layered CNP is proposed. The goal of this study is to develop the restoration system which adjusts to the future power network with distributed generators. The state of the art of this study is that the two-layered CNP is applied for the distributed computing environment in practical use. The two-layered CNP has two classes of agents, named field agent and operating agent in the network. In order to avoid conflicts of tasks, operating agent controls privilege for managers to send the task announcement messages in CNP. This technique realizes the coordination between agents which work asynchronously in parallel with others. Moreover, this study implements the distributed processing system using a de-fact standard multi-agent framework, JADE(Java Agent DEvelopment framework). This study conducts the simulation experiments of power distribution network restoration and compares the proposed system with the previous system. We confirmed the results show effectiveness of the proposed system.
Optical Interconnection Via Computer-Generated Holograms
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang; Zhou, Shaomin
1995-01-01
Method of free-space optical interconnection developed for data-processing applications like parallel optical computing, neural-network computing, and switching in optical communication networks. In method, multiple optical connections between multiple sources of light in one array and multiple photodetectors in another array made via computer-generated holograms in electrically addressed spatial light modulators (ESLMs). Offers potential advantages of massive parallelism, high space-bandwidth product, high time-bandwidth product, low power consumption, low cross talk, and low time skew. Also offers advantage of programmability with flexibility of reconfiguration, including variation of strengths of optical connections in real time.
A real-time, dual processor simulation of the rotor system research aircraft
NASA Technical Reports Server (NTRS)
Mackie, D. B.; Alderete, T. S.
1977-01-01
A real-time, man-in-the loop, simulation of the rotor system research aircraft (RSRA) was conducted. The unique feature of this simulation was that two digital computers were used in parallel to solve the equations of the RSRA mathematical model. The design, development, and implementation of the simulation are documented. Program validation was discussed, and examples of data recordings are given. This simulation provided an important research tool for the RSRA project in terms of safe and cost-effective design analysis. In addition, valuable knowledge concerning parallel processing and a powerful simulation hardware and software system was gained.
SERODS optical data storage with parallel signal transfer
Vo-Dinh, Tuan
2003-09-02
Surface-enhanced Raman optical data storage (SERODS) systems having increased reading and writing speeds, that is, increased data transfer rates, are disclosed. In the various SERODS read and write systems, the surface-enhanced Raman scattering (SERS) data is written and read using a two-dimensional process called parallel signal transfer (PST). The various embodiments utilize laser light beam excitation of the SERODS medium, optical filtering, beam imaging, and two-dimensional light detection. Two- and three-dimensional SERODS media are utilized. The SERODS write systems employ either a different laser or a different level of laser power.
SERODS optical data storage with parallel signal transfer
Vo-Dinh, Tuan
2003-06-24
Surface-enhanced Raman optical data storage (SERODS) systems having increased reading and writing speeds, that is, increased data transfer rates, are disclosed. In the various SERODS read and write systems, the surface-enhanced Raman scattering (SERS) data is written and read using a two-dimensional process called parallel signal transfer (PST). The various embodiments utilize laser light beam excitation of the SERODS medium, optical filtering, beam imaging, and two-dimensional light detection. Two- and three-dimensional SERODS media are utilized. The SERODS write systems employ either a different laser or a different level of laser power.
Improving Quantum Gate Simulation using a GPU
NASA Astrophysics Data System (ADS)
Gutierrez, Eladio; Romero, Sergio; Trenas, Maria A.; Zapata, Emilio L.
2008-11-01
Due to the increasing computing power of the graphics processing units (GPU), they are becoming more and more popular when solving general purpose algorithms. As the simulation of quantum computers results on a problem with exponential complexity, it is advisable to perform a parallel computation, such as the one provided by the SIMD multiprocessors present in recent GPUs. In this paper, we focus on an important quantum algorithm, the quantum Fourier transform (QTF), in order to evaluate different parallelization strategies on a novel GPU architecture. Our implementation makes use of the new CUDA software/hardware architecture developed recently by NVIDIA.
An Advanced Simulation Framework for Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Li, P. P.; Tyrrell, R. Yeung D.; Adhami, N.; Li, T.; Henry, H.
1994-01-01
Discrete-event simulation (DEVS) users have long been faced with a three-way trade-off of balancing execution time, model fidelity, and number of objects simulated. Because of the limits of computer processing power the analyst is often forced to settle for less than desired performances in one or more of these areas.
Zhu, Shijun; Nahm, Eun-Shim; Resnick, Barbara; Friedmann, Erika; Brown, Clayton; Park, Jumin; Cheon, Jooyoung; Park, DoHwan
2017-07-01
This secondary data analyses of a longitudinal study assessed whether self-efficacy for exercise (SEE) mediated online intervention effects on exercise among older adults and whether age (50-64 vs. ≥65 years) moderated the mediation. Data were from an online bone health intervention study. Eight hundred sixty-six older adults (≥50 years) were randomized to three arms: Bone Power (n = 301), Bone Power Plus (n = 302), or Control (n = 263). Parallel process latent growth curve modeling (LGCM) was used to jointly model growths in SEE and in exercise and to assess the mediating effect of SEE on the effect of intervention on exercise. SEE was a significant mediator in 50- to 64-year-old adults (0.061, 95 BCI: 0.011, 0.163) but not in the ≥65 age group (-0.004, 95% BCI: -0.047, 0.025). Promotion of SEE is critical to improve exercise among 50- to 64-year-olds.
A Real-Time Capable Software-Defined Receiver Using GPU for Adaptive Anti-Jam GPS Sensors
Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S.; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun
2011-01-01
Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities. PMID:22164116
A real-time capable software-defined receiver using GPU for adaptive anti-jam GPS sensors.
Seo, Jiwon; Chen, Yu-Hsuan; De Lorenzo, David S; Lo, Sherman; Enge, Per; Akos, Dennis; Lee, Jiyun
2011-01-01
Due to their weak received signal power, Global Positioning System (GPS) signals are vulnerable to radio frequency interference. Adaptive beam and null steering of the gain pattern of a GPS antenna array can significantly increase the resistance of GPS sensors to signal interference and jamming. Since adaptive array processing requires intensive computational power, beamsteering GPS receivers were usually implemented using hardware such as field-programmable gate arrays (FPGAs). However, a software implementation using general-purpose processors is much more desirable because of its flexibility and cost effectiveness. This paper presents a GPS software-defined radio (SDR) with adaptive beamsteering capability for anti-jam applications. The GPS SDR design is based on an optimized desktop parallel processing architecture using a quad-core Central Processing Unit (CPU) coupled with a new generation Graphics Processing Unit (GPU) having massively parallel processors. This GPS SDR demonstrates sufficient computational capability to support a four-element antenna array and future GPS L5 signal processing in real time. After providing the details of our design and optimization schemes for future GPU-based GPS SDR developments, the jamming resistance of our GPS SDR under synthetic wideband jamming is presented. Since the GPS SDR uses commercial-off-the-shelf hardware and processors, it can be easily adopted in civil GPS applications requiring anti-jam capabilities.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Implementation of ADI: Schemes on MIMD parallel computers
NASA Technical Reports Server (NTRS)
Vanderwijngaart, Rob F.
1993-01-01
In order to simulate the effects of the impingement of hot exhaust jets of High Performance Aircraft on landing surfaces a multi-disciplinary computation coupling flow dynamics to heat conduction in the runway needs to be carried out. Such simulations, which are essentially unsteady, require very large computational power in order to be completed within a reasonable time frame of the order of an hour. Such power can be furnished by the latest generation of massively parallel computers. These remove the bottleneck of ever more congested data paths to one or a few highly specialized central processing units (CPU's) by having many off-the-shelf CPU's work independently on their own data, and exchange information only when needed. During the past year the first phase of this project was completed, in which the optimal strategy for mapping an ADI-algorithm for the three dimensional unsteady heat equation to a MIMD parallel computer was identified. This was done by implementing and comparing three different domain decomposition techniques that define the tasks for the CPU's in the parallel machine. These implementations were done for a Cartesian grid and Dirichlet boundary conditions. The most promising technique was then used to implement the heat equation solver on a general curvilinear grid with a suite of nontrivial boundary conditions. Finally, this technique was also used to implement the Scalar Penta-diagonal (SP) benchmark, which was taken from the NAS Parallel Benchmarks report. All implementations were done in the programming language C on the Intel iPSC/860 computer.
Automated target recognition and tracking using an optical pattern recognition neural network
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin
1991-01-01
The on-going development of an automatic target recognition and tracking system at the Jet Propulsion Laboratory is presented. This system is an optical pattern recognition neural network (OPRNN) that is an integration of an innovative optical parallel processor and a feature extraction based neural net training algorithm. The parallel optical processor provides high speed and vast parallelism as well as full shift invariance. The neural network algorithm enables simultaneous discrimination of multiple noisy targets in spite of their scales, rotations, perspectives, and various deformations. This fully developed OPRNN system can be effectively utilized for the automated spacecraft recognition and tracking that will lead to success in the Automated Rendezvous and Capture (AR&C) of the unmanned Cargo Transfer Vehicle (CTV). One of the most powerful optical parallel processors for automatic target recognition is the multichannel correlator. With the inherent advantages of parallel processing capability and shift invariance, multiple objects can be simultaneously recognized and tracked using this multichannel correlator. This target tracking capability can be greatly enhanced by utilizing a powerful feature extraction based neural network training algorithm such as the neocognitron. The OPRNN, currently under investigation at JPL, is constructed with an optical multichannel correlator where holographic filters have been prepared using the neocognitron training algorithm. The computation speed of the neocognitron-type OPRNN is up to 10(exp 14) analog connections/sec that enabling the OPRNN to outperform its state-of-the-art electronics counterpart by at least two orders of magnitude.
Evaluation of the power consumption of a high-speed parallel robot
NASA Astrophysics Data System (ADS)
Han, Gang; Xie, Fugui; Liu, Xin-Jun
2018-06-01
An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.
Energy-efficient STDP-based learning circuits with memristor synapses
NASA Astrophysics Data System (ADS)
Wu, Xinyu; Saxena, Vishal; Campbell, Kristy A.
2014-05-01
It is now accepted that the traditional von Neumann architecture, with processor and memory separation, is ill suited to process parallel data streams which a mammalian brain can efficiently handle. Moreover, researchers now envision computing architectures which enable cognitive processing of massive amounts of data by identifying spatio-temporal relationships in real-time and solving complex pattern recognition problems. Memristor cross-point arrays, integrated with standard CMOS technology, are expected to result in massively parallel and low-power Neuromorphic computing architectures. Recently, significant progress has been made in spiking neural networks (SNN) which emulate data processing in the cortical brain. These architectures comprise of a dense network of neurons and the synapses formed between the axons and dendrites. Further, unsupervised or supervised competitive learning schemes are being investigated for global training of the network. In contrast to a software implementation, hardware realization of these networks requires massive circuit overhead for addressing and individually updating network weights. Instead, we employ bio-inspired learning rules such as the spike-timing-dependent plasticity (STDP) to efficiently update the network weights locally. To realize SNNs on a chip, we propose to use densely integrating mixed-signal integrate-andfire neurons (IFNs) and cross-point arrays of memristors in back-end-of-the-line (BEOL) of CMOS chips. Novel IFN circuits have been designed to drive memristive synapses in parallel while maintaining overall power efficiency (<1 pJ/spike/synapse), even at spike rate greater than 10 MHz. We present circuit design details and simulation results of the IFN with memristor synapses, its response to incoming spike trains and STDP learning characterization.
Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation
Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan
2014-01-01
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432
GPU accelerated fuzzy connected image segmentation by using CUDA.
Zhuge, Ying; Cao, Yong; Miller, Robert W
2009-01-01
Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Exo-reversible staging of coolers in series and in parallel
NASA Astrophysics Data System (ADS)
Maytal, Ben-Zion
2017-10-01
Serial and parallel staging of exo-reversible coolers are formulated, analyzed and compared. The parallel staging includes an extensive parameter which is the proportion of combined stages. This extensive free parameter affects the intensive factors of specific power and figure of merit. Serial staging reduces the 1st Law efficiency and parallel staging improves the 2nd Law efficiency. Comparison of a parallel with a serial staging under common cooling capacity and cooling range, shows that it is always possible to find a parallel arrangement of lower specific power and more compact. Some results are demonstrated on staging of Joule-Thomson cryocoolers (below and above the Joule-Thomson inversion temperature).
Parallel and pipeline computation of fast unitary transforms
NASA Technical Reports Server (NTRS)
Fino, B. J.; Algazi, V. R.
1975-01-01
The letter discusses the parallel and pipeline organization of fast-unitary-transform algorithms such as the fast Fourier transform, and points out the efficiency of a combined parallel-pipeline processor of a transform such as the Haar transform, in which (2 to the n-th power) -1 hardware 'butterflies' generate a transform of order 2 to the n-th power every computation cycle.
High Input Voltage Discharge Supply for High Power Hall Thrusters Using Silicon Carbide Devices
NASA Technical Reports Server (NTRS)
Pinero, Luis R.; Scheidegger, Robert J.; Aulsio, Michael V.; Birchenough, Arthur G.
2014-01-01
A power processing unit for a 15 kW Hall thruster is under development at NASA Glenn Research Center. The unit produces up to 400 VDC with two parallel 7.5 kW discharge modules that operate from a 300 VDC nominal input voltage. Silicon carbide MOSFETs and diodes were used in this design because they were the best choice to handle the high voltage stress while delivering high efficiency and low specific mass. Efficiencies in excess of 97 percent were demonstrated during integration testing with the NASA-300M 20 kW Hall thruster. Electromagnet, cathode keeper, and heater supplies were also developed and will be integrated with the discharge supply into a vacuum-rated brassboard power processing unit with full flight functionality. This design could be evolved into a flight unit for future missions that requires high power electric propulsion.
Associative architecture for image processing
NASA Astrophysics Data System (ADS)
Adar, Rutie; Akerib, Avidan
1997-09-01
This article presents a new generation in parallel processing architecture for real-time image processing. The approach is implemented in a real time image processor chip, called the XiumTM-2, based on combining a fully associative array which provides the parallel engine with a serial RISC core on the same die. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and media processing functions in real time. The associative part of the chip is based on patented pending methodology of Associative Computing Ltd. (ACL), which condenses 2048 associative processors, each of 128 'intelligent' bits. Each bit can be a processing bit or a memory bit. At only 33 MHz and 0.6 micron manufacturing technology process, the chip has a computational power of 3 billion ALU operations per second and 66 billion string search operations per second. The fully programmable nature of the XiumTM-2 chip enables developers to use ACL tools to write their own proprietary algorithms combined with existing image processing and analysis functions from ACL's extended set of libraries.
Numerical propulsion system simulation: An interdisciplinary approach
NASA Technical Reports Server (NTRS)
Nichols, Lester D.; Chamis, Christos C.
1991-01-01
The tremendous progress being made in computational engineering and the rapid growth in computing power that is resulting from parallel processing now make it feasible to consider the use of computer simulations to gain insights into the complex interactions in aerospace propulsion systems and to evaluate new concepts early in the design process before a commitment to hardware is made. Described here is a NASA initiative to develop a Numerical Propulsion System Simulation (NPSS) capability.
Numerical propulsion system simulation - An interdisciplinary approach
NASA Technical Reports Server (NTRS)
Nichols, Lester D.; Chamis, Christos C.
1991-01-01
The tremendous progress being made in computational engineering and the rapid growth in computing power that is resulting from parallel processing now make it feasible to consider the use of computer simulations to gain insights into the complex interactions in aerospace propulsion systems and to evaluate new concepts early in the design process before a commitment to hardware is made. Described here is a NASA initiative to develop a Numerical Propulsion System Simulation (NPSS) capability.
20 kHz main inverter unit. [for space station power supplies
NASA Technical Reports Server (NTRS)
Hussey, S.
1989-01-01
A proof-of-concept main inverter unit has demonstrated the operation of a pulse-width-modulated parallel resonant power stage topology as a 20-kHz ac power source driver, showing simple output regulation, parallel operation, power sharing and short-circuit operation. The use of a two-stage dc input filter controls the electromagnetic compatibility (EMC) characteristics of the dc power bus, and the use of an ac harmonic trap controls the EMC characteristics of the 20-kHz ac power bus.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-01-10
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Secondary Heat Exchanger Design and Comparison for Advanced High Temperature Reactor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piyush Sabharwall; Ali Siahpush; Michael McKellar
2012-06-01
The goals of next generation nuclear reactors, such as the high temperature gas-cooled reactor and advance high temperature reactor (AHTR), are to increase energy efficiency in the production of electricity and provide high temperature heat for industrial processes. The efficient transfer of energy for industrial applications depends on the ability to incorporate effective heat exchangers between the nuclear heat transport system and the industrial process heat transport system. The need for efficiency, compactness, and safety challenge the boundaries of existing heat exchanger technology, giving rise to the following study. Various studies have been performed in attempts to update the secondarymore » heat exchanger that is downstream of the primary heat exchanger, mostly because its performance is strongly tied to the ability to employ more efficient conversion cycles, such as the Rankine super critical and subcritical cycles. This study considers two different types of heat exchangers—helical coiled heat exchanger and printed circuit heat exchanger—as possible options for the AHTR secondary heat exchangers with the following three different options: (1) A single heat exchanger transfers all the heat (3,400 MW(t)) from the intermediate heat transfer loop to the power conversion system or process plants; (2) Two heat exchangers share heat to transfer total heat of 3,400 MW(t) from the intermediate heat transfer loop to the power conversion system or process plants, each exchanger transfers 1,700 MW(t) with a parallel configuration; and (3) Three heat exchangers share heat to transfer total heat of 3,400 MW(t) from the intermediate heat transfer loop to the power conversion system or process plants. Each heat exchanger transfers 1,130 MW(t) with a parallel configuration. A preliminary cost comparison will be provided for all different cases along with challenges and recommendations.« less
A parallel-processing approach to computing for the geographic sciences
Crane, Michael; Steinwand, Dan; Beckmann, Tim; Krpan, Greg; Haga, Jim; Maddox, Brian; Feller, Mark
2001-01-01
The overarching goal of this project is to build a spatially distributed infrastructure for information science research by forming a team of information science researchers and providing them with similar hardware and software tools to perform collaborative research. Four geographically distributed Centers of the U.S. Geological Survey (USGS) are developing their own clusters of low-cost personal computers into parallel computing environments that provide a costeffective way for the USGS to increase participation in the high-performance computing community. Referred to as Beowulf clusters, these hybrid systems provide the robust computing power required for conducting research into various areas, such as advanced computer architecture, algorithms to meet the processing needs for real-time image and data processing, the creation of custom datasets from seamless source data, rapid turn-around of products for emergency response, and support for computationally intense spatial and temporal modeling.
State of the art in electromagnetic modeling for the Compact Linear Collider
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, Arno; Kabel, Andreas; Lee, Lie-Quan
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D electromagnetic time-domain code T3P for simulations of wakefields and transients in complex accelerator structures. T3P is based on state-of-the-art Finite Element methods on unstructured grids and features unconditional stability, quadratic surface approximation and up to 6th-order vector basis functions for unprecedented simulation accuracy. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with fast turn-around times, aiding the design of the next generation of accelerator facilities. Applications include simulations of the proposed two-beam accelerator structures for the Compact Linear Collider (CLIC) - wakefieldmore » damping in the Power Extraction and Transfer Structure (PETS) and power transfer to the main beam accelerating structures are investigated.« less
A real time microcomputer implementation of sensor failure detection for turbofan engines
NASA Technical Reports Server (NTRS)
Delaat, John C.; Merrill, Walter C.
1989-01-01
An algorithm was developed which detects, isolates, and accommodates sensor failures using analytical redundancy. The performance of this algorithm was demonstrated on a full-scale F100 turbofan engine. The algorithm was implemented in real-time on a microprocessor-based controls computer which includes parallel processing and high order language programming. Parallel processing was used to achieve the required computational power for the real-time implementation. High order language programming was used in order to reduce the programming and maintenance costs of the algorithm implementation software. The sensor failure algorithm was combined with an existing multivariable control algorithm to give a complete control implementation with sensor analytical redundancy. The real-time microprocessor implementation of the algorithm which resulted in the successful completion of the algorithm engine demonstration, is described.
A note on parallel and pipeline computation of fast unitary transforms
NASA Technical Reports Server (NTRS)
Fino, B. J.; Algazi, V. R.
1974-01-01
The parallel and pipeline organization of fast unitary transform algorithms such as the Fast Fourier Transform are discussed. The efficiency is pointed out of a combined parallel-pipeline processor of a transform such as the Haar transform in which 2 to the n minus 1 power hardware butterflies generate a transform of order 2 to the n power every computation cycle.
Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Bailey, David (Technical Monitor)
1996-01-01
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
Parallel matrix multiplication on the Connection Machine
NASA Technical Reports Server (NTRS)
Tichy, Walter F.
1988-01-01
Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
Concurrent Probabilistic Simulation of High Temperature Composite Structural Response
NASA Technical Reports Server (NTRS)
Abdi, Frank
1996-01-01
A computational structural/material analysis and design tool which would meet industry's future demand for expedience and reduced cost is presented. This unique software 'GENOA' is dedicated to parallel and high speed analysis to perform probabilistic evaluation of high temperature composite response of aerospace systems. The development is based on detailed integration and modification of diverse fields of specialized analysis techniques and mathematical models to combine their latest innovative capabilities into a commercially viable software package. The technique is specifically designed to exploit the availability of processors to perform computationally intense probabilistic analysis assessing uncertainties in structural reliability analysis and composite micromechanics. The primary objectives which were achieved in performing the development were: (1) Utilization of the power of parallel processing and static/dynamic load balancing optimization to make the complex simulation of structure, material and processing of high temperature composite affordable; (2) Computational integration and synchronization of probabilistic mathematics, structural/material mechanics and parallel computing; (3) Implementation of an innovative multi-level domain decomposition technique to identify the inherent parallelism, and increasing convergence rates through high- and low-level processor assignment; (4) Creating the framework for Portable Paralleled architecture for the machine independent Multi Instruction Multi Data, (MIMD), Single Instruction Multi Data (SIMD), hybrid and distributed workstation type of computers; and (5) Market evaluation. The results of Phase-2 effort provides a good basis for continuation and warrants Phase-3 government, and industry partnership.
Advanced Electric Distribution, Switching, and Conversion Technology for Power Control
NASA Technical Reports Server (NTRS)
Soltis, James V.
1998-01-01
The Electrical Power Control Unit currently under development by Sundstrand Aerospace for use on the Fluids Combustion Facility of the International Space Station is the precursor of modular power distribution and conversion concepts for future spacecraft and aircraft applications. This unit combines modular current-limiting flexible remote power controllers and paralleled power converters into one package. Each unit includes three 1-kW, current-limiting power converter modules designed for a variable-ratio load sharing capability. The flexible remote power controllers can be used in parallel to match load requirements and can be programmed for an initial ON or OFF state on powerup. The unit contains an integral cold plate. The modularity and hybridization of the Electrical Power Control Unit sets the course for future spacecraft electrical power systems, both large and small. In such systems, the basic hybridized converter and flexible remote power controller building blocks could be configured to match power distribution and conversion capabilities to load requirements. In addition, the flexible remote power controllers could be configured in assemblies to feed multiple individual loads and could be used in parallel to meet the specific current requirements of each of those loads. Ultimately, the Electrical Power Control Unit design concept could evolve to a common switch module hybrid, or family of hybrids, for both converter and switchgear applications. By assembling hybrids of a common current rating and voltage class in parallel, researchers could readily adapt these units for multiple applications. The Electrical Power Control Unit concept has the potential to be scaled to larger and smaller ratings for both small and large spacecraft and for aircraft where high-power density, remote power controllers or power converters are required and a common replacement part is desired for multiples of a base current rating.
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
NASA Astrophysics Data System (ADS)
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
RISC Processors and High Performance Computing
NASA Technical Reports Server (NTRS)
Saini, Subhash; Bailey, David H.; Lasinski, T. A. (Technical Monitor)
1995-01-01
In this tutorial, we will discuss top five current RISC microprocessors: The IBM Power2, which is used in the IBM RS6000/590 workstation and in the IBM SP2 parallel supercomputer, the DEC Alpha, which is in the DEC Alpha workstation and in the Cray T3D; the MIPS R8000, which is used in the SGI Power Challenge; the HP PA-RISC 7100, which is used in the HP 700 series workstations and in the Convex Exemplar; and the Cray proprietary processor, which is used in the new Cray J916. The architecture of these microprocessors will first be presented. The effective performance of these processors will then be compared, both by citing standard benchmarks and also in the context of implementing a real applications. In the process, different programming models such as data parallel (CM Fortran and HPF) and message passing (PVM and MPI) will be introduced and compared. The latest NAS Parallel Benchmark (NPB) absolute performance and performance per dollar figures will be presented. The next generation of the NP13 will also be described. The tutorial will conclude with a discussion of general trends in the field of high performance computing, including likely future developments in hardware and software technology, and the relative roles of vector supercomputers tightly coupled parallel computers, and clusters of workstations. This tutorial will provide a unique cross-machine comparison not available elsewhere.
Autonomous target tracking of UAVs based on low-power neural network hardware
NASA Astrophysics Data System (ADS)
Yang, Wei; Jin, Zhanpeng; Thiem, Clare; Wysocki, Bryant; Shen, Dan; Chen, Genshe
2014-05-01
Detecting and identifying targets in unmanned aerial vehicle (UAV) images and videos have been challenging problems due to various types of image distortion. Moreover, the significantly high processing overhead of existing image/video processing techniques and the limited computing resources available on UAVs force most of the processing tasks to be performed by the ground control station (GCS) in an off-line manner. In order to achieve fast and autonomous target identification on UAVs, it is thus imperative to investigate novel processing paradigms that can fulfill the real-time processing requirements, while fitting the size, weight, and power (SWaP) constrained environment. In this paper, we present a new autonomous target identification approach on UAVs, leveraging the emerging neuromorphic hardware which is capable of massively parallel pattern recognition processing and demands only a limited level of power consumption. A proof-of-concept prototype was developed based on a micro-UAV platform (Parrot AR Drone) and the CogniMemTMneural network chip, for processing the video data acquired from a UAV camera on the y. The aim of this study was to demonstrate the feasibility and potential of incorporating emerging neuromorphic hardware into next-generation UAVs and their superior performance and power advantages towards the real-time, autonomous target tracking.
Mirbozorgi, S Abdollah; Bahrami, Hadi; Sawan, Mohamad; Gosselin, Benoit
2016-04-01
This paper presents a novel experimental chamber with uniform wireless power distribution in 3D for enabling long-term biomedical experiments with small freely moving animal subjects. The implemented power transmission chamber prototype is based on arrays of parallel resonators and multicoil inductive links, to form a novel and highly efficient wireless power transmission system. The power transmitter unit includes several identical resonators enclosed in a scalable array of overlapping square coils which are connected in parallel to provide uniform power distribution along x and y. Moreover, the proposed chamber uses two arrays of primary resonators, facing each other, and connected in parallel to achieve uniform power distribution along the z axis. Each surface includes 9 overlapped coils connected in parallel and implemented into two layers of FR4 printed circuit board. The chamber features a natural power localization mechanism, which simplifies its implementation and ease its operation by avoiding the need for active detection and control mechanisms. A single power surface based on the proposed approach can provide a power transfer efficiency (PTE) of 69% and a power delivered to the load (PDL) of 120 mW, for a separation distance of 4 cm, whereas the complete chamber prototype provides a uniform PTE of 59% and a PDL of 100 mW in 3D, everywhere inside the chamber with a size of 27×27×16 cm(3).
NASA Astrophysics Data System (ADS)
Munira, Kamaram; Pandey, Sumeet C.; Kula, Witold; Sandhu, Gurtej S.
2016-11-01
Voltage-controlled magnetic anisotropy (VCMA) effect has attracted a significant amount of attention in recent years because of its low cell power consumption during the anisotropy modulation of a thin ferromagnetic film. However, the applied voltage or electric field alone is not enough to completely and reliably reverse the magnetization of the free layer of a magnetic random access memory (MRAM) cell from anti-parallel to parallel configuration or vice versa. An additional symmetry-breaking mechanism needs to be employed to ensure the deterministic writing process. Combinations of voltage-controlled magnetic anisotropy together with spin-transfer torque (STT) and with an applied magnetic field (Happ) were evaluated for switching reliability, time taken to switch with low error rate, and energy consumption during the switching process. In order to get a low write error rate in the MRAM cell with VCMA switching mechanism, a spin-transfer torque current or an applied magnetic field comparable to the critical current and field of the free layer is necessary. In the hybrid processes, the VCMA effect lowers the duration during which the higher power hungry secondary mechanism is in place. Therefore, the total energy consumed during the hybrid writing processes, VCMA + STT or VCMA + Happ, is less than the energy consumed during pure spin-transfer torque or applied magnetic field switching.
Minimum envelope roughness pulse design for reduced amplifier distortion in parallel excitation.
Grissom, William A; Kerr, Adam B; Stang, Pascal; Scott, Greig C; Pauly, John M
2010-11-01
Parallel excitation uses multiple transmit channels and coils, each driven by independent waveforms, to afford the pulse designer an additional spatial encoding mechanism that complements gradient encoding. In contrast to parallel reception, parallel excitation requires individual power amplifiers for each transmit channel, which can be cost prohibitive. Several groups have explored the use of low-cost power amplifiers for parallel excitation; however, such amplifiers commonly exhibit nonlinear memory effects that distort radio frequency pulses. This is especially true for pulses with rapidly varying envelopes, which are common in parallel excitation. To overcome this problem, we introduce a technique for parallel excitation pulse design that yields pulses with smoother envelopes. We demonstrate experimentally that pulses designed with the new technique suffer less amplifier distortion than unregularized pulses and pulses designed with conventional regularization.
Cazzaniga, Paolo; Nobile, Marco S.; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo
2014-01-01
The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations. PMID:25025072
Scalable software architecture for on-line multi-camera video processing
NASA Astrophysics Data System (ADS)
Camplani, Massimo; Salgado, Luis
2011-03-01
In this paper we present a scalable software architecture for on-line multi-camera video processing, that guarantees a good trade off between computational power, scalability and flexibility. The software system is modular and its main blocks are the Processing Units (PUs), and the Central Unit. The Central Unit works as a supervisor of the running PUs and each PU manages the acquisition phase and the processing phase. Furthermore, an approach to easily parallelize the desired processing application has been presented. In this paper, as case study, we apply the proposed software architecture to a multi-camera system in order to efficiently manage multiple 2D object detection modules in a real-time scenario. System performance has been evaluated under different load conditions such as number of cameras and image sizes. The results show that the software architecture scales well with the number of camera and can easily works with different image formats respecting the real time constraints. Moreover, the parallelization approach can be used in order to speed up the processing tasks with a low level of overhead.
Novel Highly Parallel and Systolic Architectures Using Quantum Dot-Based Hardware
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Benny N.; Spotnitz, Matthew
1997-01-01
VLSI technology has made possible the integration of massive number of components (processors, memory, etc.) into a single chip. In VLSI design, memory and processing power are relatively cheap and the main emphasis of the design is on reducing the overall interconnection complexity since data routing costs dominate the power, time, and area required to implement a computation. Communication is costly because wires occupy the most space on a circuit and it can also degrade clock time. In fact, much of the complexity (and hence the cost) of VLSI design results from minimization of data routing. The main difficulty in VLSI routing is due to the fact that crossing of the lines carrying data, instruction, control, etc. is not possible in a plane. Thus, in order to meet this constraint, the VLSI design aims at keeping the architecture highly regular with local and short interconnection. As a result, while the high level of integration has opened the way for massively parallel computation, practical and full exploitation of such a capability in many applications of interest has been hindered by the constraints on interconnection pattern. More precisely. the use of only localized communication significantly simplifies the design of interconnection architecture but at the expense of somewhat restricted class of applications. For example, there are currently commercially available products integrating; hundreds of simple processor elements within a single chip. However, the lack of adequate interconnection pattern among these processing elements make them inefficient for exploiting a large degree of parallelism in many applications.
MPgrafic: A parallel MPI version of Grafic-1
NASA Astrophysics Data System (ADS)
Prunet, Simon; Pichon, Christophe
2013-04-01
MPgrafic is a parallel MPI version of Grafic-1 which can produce large cosmological initial conditions on a cluster without requiring shared memory. The real Fourier transforms are carried in place using fftw while minimizing the amount of used memory (at the expense of performance) in the spirit of Grafic-1. The writing of the output file is also carried in parallel. In addition to the technical parallelization, it provides three extensions over Grafic-1: it can produce power spectra with baryon wiggles (DJ Eisenstein and W. Hu, Ap. J. 496);it has the optional ability to load a lower resolution noise map corresponding to the low frequency component which will fix the larger scale modes of the simulation (extra flag 0/1 at the end of the input process) in the spirit of Grafic-2;it can be used in conjunction with constrfield, which generates initial conditions phases from a list of local constraints on density, tidal field density gradient and velocity.
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
Co-gasification of solid waste and lignite - a case study for Western Macedonia.
Koukouzas, N; Katsiadakis, A; Karlopoulos, E; Kakaras, E
2008-01-01
Co-gasification of solid waste and coal is a very attractive and efficient way of generating power, but also an alternative way, apart from conventional technologies such as incineration and landfill, of treating waste materials. The technology of co-gasification can result in very clean power plants using a wide range of solid fuels but there are considerable economic and environmental challenges. The aim of this study is to present the available existing co-gasification techniques and projects for coal and solid wastes and to investigate the techno-economic feasibility, concerning the installation and operation of a 30MW(e) co-gasification power plant based on integrated gasification combined cycle (IGCC) technology, using lignite and refuse derived fuel (RDF), in the region of Western Macedonia prefecture (WMP), Greece. The gasification block was based on the British Gas-Lurgi (BGL) gasifier, while the gas clean-up block was based on cold gas purification. The competitive advantages of co-gasification systems can be defined both by the fuel feedstock and production flexibility but also by their environmentally sound operation. It also offers the benefit of commercial application of the process by-products, gasification slag and elemental sulphur. Co-gasification of coal and waste can be performed through parallel or direct gasification. Direct gasification constitutes a viable choice for installations with capacities of more than 350MW(e). Parallel gasification, without extensive treatment of produced gas, is recommended for gasifiers of small to medium size installed in regions where coal-fired power plants operate. The preliminary cost estimation indicated that the establishment of an IGCC RDF/lignite plant in the region of WMP is not profitable, due to high specific capital investment and in spite of the lower fuel supply cost. The technology of co-gasification is not mature enough and therefore high capital requirements are needed in order to set up a direct co-gasification plant. The cost of electricity estimated was not competitive, compared to the prices dominating the Greek electricity market and thus further economic evaluation is required. The project would be acceptable if modular construction of the unit was first adopted near operating power plants, based on parallel co-gasification, and gradually incorporating the remaining process steps (gas purification, power generation) with the aim of eventually establishing a true direct co-gasification plant.
Igarashi, Jun; Shouno, Osamu; Fukai, Tomoki; Tsujino, Hiroshi
2011-11-01
Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic neurons. In order to address these problems, mainly the latter, we chose to use general purpose computing on graphics processing units (GPGPUs) for simulation of such a neural network, taking advantage of the powerful computational capability of a graphics processing unit (GPU). As a target for real-time simulation, we used a model of the basal ganglia that has been developed according to electrophysiological and anatomical knowledge. The model consists of heterogeneous populations of 370 spiking model neurons, including computationally heavy conductance-based models, connected by 11,002 synapses. Simulation of the model has not yet been performed in real-time using a general computing server. By parallelization of the model on the NVIDIA Geforce GTX 280 GPU in data-parallel and task-parallel fashion, faster-than-real-time simulation was robustly realized with only one-third of the GPU's total computational resources. Furthermore, we used the GPU's full computational resources to perform faster-than-real-time simulation of three instances of the basal ganglia model; these instances consisted of 1100 neurons and 33,006 synapses and were synchronized at each calculation step. Finally, we developed software for simultaneous visualization of faster-than-real-time simulation output. These results suggest the potential power of GPGPU techniques in real-time simulation of realistic neural networks. Copyright © 2011 Elsevier Ltd. All rights reserved.
1991-03-01
factor which made TTL-design so powerful was the implicit knowledge that for any object in the TTL Databook, that object’s implementation and...functions as values. Thus, its reasoning power matches the descriptive power of the higher order languages in the previous section. First, the definitions...developing parallel algorithms to better utilize the power of the explicitly parallel programming language constructs. Currently, the methodologies
Formalization, equivalence and generalization of basic resonance electrical circuits
NASA Astrophysics Data System (ADS)
Penev, Dimitar; Arnaudov, Dimitar; Hinov, Nikolay
2017-12-01
In the work are presented basic resonance circuits, which are used in resonance energy converters. The following resonant circuits are considered: serial, serial with parallel load parallel capacitor, parallel and parallel with serial loaded inductance. For the circuits under consideration, expressions are generated for the frequencies of own oscillations and for the equivalence of the active power emitted in the load. Mathematical expressions are graphically constructed and verified using computer simulations. The results obtained are used in the model based design of resonant energy converters with DC or AC output. This guaranteed the output indicators of power electronic devices.
Using Abstraction in Explicity Parallel Programs.
1991-07-01
However, we only rely on sequential consistency of memory operations. includ- ing reads. writes and any synchronization primitives provided by the...explicit synchronization primitives . This demonstrates the practical power of sequentially consistent memory, as opposed to weaker models of memory that...a small set of synchronization primitives , all pro- cedures have non-waiting specifications. This is in contrast to richer process-oriented
Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan
2007-01-01
Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images.
Du, Xiaogang; Dang, Jianwu; Wang, Yangping; Wang, Song; Lei, Tao
2016-01-01
The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU).
Waveguide device and method for making same
Forman, Michael A [San Francisco, CA
2007-08-14
A monolithic micromachined waveguide device or devices with low-loss, high-power handling, and near-optical frequency ranges is set forth. The waveguide and integrated devices are capable of transmitting near-optical frequencies due to optical-quality sidewall roughness. The device or devices are fabricated in parallel, may be mass produced using a LIGA manufacturing process, and may include a passive component such as a diplexer and/or an active capping layer capable of particularized signal processing of the waveforms propagated by the waveguide.
Lasers for industrial production processing: tailored tools with increasing flexibility
NASA Astrophysics Data System (ADS)
Rath, Wolfram
2012-03-01
High-power fiber lasers are the newest generation of diode-pumped solid-state lasers. Due to their all-fiber design they are compact, efficient and robust. Rofin's Fiber lasers are available with highest beam qualities but the use of different process fiber core sizes enables the user additionally to adapt the beam quality, focus size and Rayleigh length to his requirements for best processing results. Multi-mode fibers from 50μm to 600μm with corresponding beam qualities of 2.5 mm.mrad to 25 mm.mrad are typically used. The integrated beam switching modules can make the laser power available to 4 different manufacturing systems or can share the power to two processing heads for parallel processing. Also CO2 Slab lasers combine high power with either "single-mode" beam quality or higher order modes. The wellestablished technique is in use for a large number of industrial applications, processing either metals or non-metallic materials. For many of these applications CO2 lasers remain the best choice of possible laser sources either driven by the specific requirements of the application or because of the cost structure of the application. The actual technical properties of these lasers will be presented including an overview over the wavelength driven differences of application results, examples of current industrial practice as cutting, welding, surface processing including the flexible use of scanners and classical optics processing heads.
Optimization of the oxidant supply system for combined cycle MHD power plants
NASA Technical Reports Server (NTRS)
Juhasz, A. J.
1982-01-01
An in-depth study was conducted to determine what, if any, improvements could be made on the oxidant supply system for combined cycle MHD power plants which could be reflected in higher thermal efficiency and a reduction in the cost of electricity, COE. A systematic analysis of air separation process varitions which showed that the specific energy consumption could be minimized when the product stream oxygen concentration is about 70 mole percent was conducted. The use of advanced air compressors, having variable speed and guide vane position control, results in additional power savings. The study also led to the conceptual design of a new air separation process, sized for a 500 MW sub e MHD plant, referred to a internal compression is discussed. In addition to its lower overall energy consumption, potential capital cost savings were identified for air separation plants using this process when constructed in a single large air separation train rather than multiple parallel trains, typical of conventional practice.
NEET-AMM Final Technical Report on Laser Direct Manufacturing (LDM) for Nuclear Power Components
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Scott; Baca, Georgina; O'Connor, Michael
2015-12-31
Final technical report summarizes the program progress and technical accomplishments of the Laser Direct Manufacturing (LDM) for Nuclear Power Components project. A series of experiments varying build process parameters (scan speed and laser power) were conducted at the outset to establish the optimal build conditions for each of the alloys. Fabrication was completed in collaboration with Quad City Manufacturing Laboratory (QCML). The density of all sample specimens was measured and compared to literature values. Optimal build process conditions giving fabricated part densities close to literature values were chosen for making mechanical test coupons. Test coupons whose principal axis is onmore » the x-y plane (perpendicular to build direction) and on the z plane (parallel to build direction) were built and tested as part of the experimental build matrix to understand the impact of the anisotropic nature of the process.. Investigations are described 316L SS, Inconel 600, 718 and 800 and oxide dispersion strengthed 316L SS (Yttria) alloys.« less
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Zhao, Nannan; Angelidaki, Irini; Zhang, Yifeng
2017-02-01
Stack connection (i.e., in series or parallel) of microbial fuel cell (MFC) is an efficient way to boost the power output for practical application. However, there is little information available on short-term changes in stack connection and its effect on the electricity generation and microbial community. In this study, a self-stacked submersible microbial fuel cell (SSMFC) powered by glycerol was tested to elucidate this important issue. In series connection, the maximum voltage output reached to 1.15 V, while maximum current density was 5.73 mA in parallel. In both connections, the maximum power density increased with the initial glycerol concentration. However, the glycerol degradation was even faster in parallel connection. When the SSMFC was shifted from series to parallel connection, the reactor reached to a stable power output without any lag phase. Meanwhile, the anodic microbial community compositions were nearly stable. Comparatively, after changing parallel to series connection, there was a lag period for the system to get stable again and the microbial community compositions became greatly different. This study is the first attempt to elucidate the influence of short-term changes in connection on the performance of MFC stack, and could provide insight to the practical utilization of MFC. Copyright © 2016 Elsevier Ltd. All rights reserved.
Miao Meng; Kiani, Mehdi
2016-08-01
In order to achieve efficient wireless power transmission (WPT) to biomedical implants with millimeter (mm) dimensions, ultrasonic WPT links have recently been proposed. Operating both transmitter (Tx) and receiver (Rx) ultrasonic transducers at their resonance frequency (fr) is key in improving power transmission efficiency (PTE). In this paper, different resonance configurations for Tx and Rx transducers, including series and parallel resonance, have been studied to help the designers of ultrasonic WPT links to choose the optimal resonance configuration for Tx and Rx that maximizes PTE. The geometries for disk-shaped transducers of four different sets of links, operating at series-series, series-parallel, parallel-series, and parallel-parallel resonance configurations in Tx and Rx, have been found through finite-element method (FEM) simulation tools for operation at fr of 1.4 MHz. Our simulation results suggest that operating the Tx transducer with parallel resonance increases PTE, while the resonance configuration of the mm-sized Rx transducer highly depends on the load resistance, Rl. For applications that involve large Rl in the order of tens of kΩ, a parallel resonance for a mm-sized Rx leads to higher PTE, while series resonance is preferred for Rl in the order of several kΩ and below.
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure
NASA Astrophysics Data System (ADS)
Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen
2017-02-01
Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
Electrical performance characteristics of high power converters for space power applications
NASA Technical Reports Server (NTRS)
Stuart, Thomas A.; King, Roger J.
1989-01-01
The first goal of this project was to investigate various converters that would be suitable for processing electric power derived from a nuclear reactor. The implementation is indicated of a 20 kHz system that includes a source converter, a ballast converter, and a fixed frequency converter for generating the 20 kHz output. This system can be converted to dc simply by removing the fixed frequency converter. This present study emphasized the design and testing of the source and ballast converters. A push-pull current-fed (PPCF) design was selected for the source converter, and a 2.7 kW version of this was implemented using three 900 watt modules in parallel. The characteristic equation for two converters in parallel was derived, but this analysis did not yield any experimental methods for measuring relative stability. The three source modules were first tested individually and then in parallel as a 2.7 kW system. All tests proved to be satisfactory; the system was stable; efficiency and regulation were acceptable; and the system was fault tolerant. The design of a ballast-load converter, which was operated as a shunt regulator, was investigated. The proposed power circuit is suitable for use with BJTs because proportional base drive is easily implemented. A control circuit which minimizes switching frequency ripple and automatically bypasses a faulty shunt section was developed. A nonlinear state-space-averaged model of the shunt regulator was developed and shown to produce an accurate incremental (small-signal) dynamic model, even though the usual state-space-averaging assumptions were not met. The nonlinear model was also shown to be useful for large-signal dynamic simulation using PSpice.
Autoplan: A self-processing network model for an extended blocks world planning environment
NASA Technical Reports Server (NTRS)
Dautrechy, C. Lynne; Reggia, James A.; Mcfadden, Frank
1990-01-01
Self-processing network models (neural/connectionist models, marker passing/message passing networks, etc.) are currently undergoing intense investigation for a variety of information processing applications. These models are potentially very powerful in that they support a large amount of explicit parallel processing, and they cleanly integrate high level and low level information processing. However they are currently limited by a lack of understanding of how to apply them effectively in many application areas. The formulation of self-processing network methods for dynamic, reactive planning is studied. The long-term goal is to formulate robust, computationally effective information processing methods for the distributed control of semiautonomous exploration systems, e.g., the Mars Rover. The current research effort is focusing on hierarchical plan generation, execution and revision through local operations in an extended blocks world environment. This scenario involves many challenging features that would be encountered in a real planning and control environment: multiple simultaneous goals, parallel as well as sequential action execution, action sequencing determined not only by goals and their interactions but also by limited resources (e.g., three tasks, two acting agents), need to interpret unanticipated events and react appropriately through replanning, etc.
Abrahamyan, Lusine; Li, Chuan Silvia; Beyene, Joseph; Willan, Andrew R; Feldman, Brian M
2011-03-01
The study evaluated the power of the randomized placebo-phase design (RPPD)-a new design of randomized clinical trials (RCTs), compared with the traditional parallel groups design, assuming various response time distributions. In the RPPD, at some point, all subjects receive the experimental therapy, and the exposure to placebo is for only a short fixed period of time. For the study, an object-oriented simulation program was written in R. The power of the simulated trials was evaluated using six scenarios, where the treatment response times followed the exponential, Weibull, or lognormal distributions. The median response time was assumed to be 355 days for the placebo and 42 days for the experimental drug. Based on the simulation results, the sample size requirements to achieve the same level of power were different under different response time to treatment distributions. The scenario where the response times followed the exponential distribution had the highest sample size requirement. In most scenarios, the parallel groups RCT had higher power compared with the RPPD. The sample size requirement varies depending on the underlying hazard distribution. The RPPD requires more subjects to achieve a similar power to the parallel groups design. Copyright © 2011 Elsevier Inc. All rights reserved.
Induction heating using induction coils in series-parallel circuits
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matsen, Marc Rollo; Geren, William Preston; Miller, Robert James
A part is inductively heated by multiple, self-regulating induction coil circuits having susceptors, coupled together in parallel and in series with an AC power supply. Each of the circuits includes a tuning capacitor that tunes the circuit to resonate at the frequency of AC power supply.
Aprà, E; Kowalski, K
2016-03-08
In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
2012-01-01
Background Structured association mapping is proving to be a powerful strategy to find genetic polymorphisms associated with disease. However, these algorithms are often distributed as command line implementations that require expertise and effort to customize and put into practice. Because of the difficulty required to use these cutting-edge techniques, geneticists often revert to simpler, less powerful methods. Results To make structured association mapping more accessible to geneticists, we have developed an automatic processing system called Auto-SAM. Auto-SAM enables geneticists to run structured association mapping algorithms automatically, using parallelization. Auto-SAM includes algorithms to discover gene-networks and find population structure. Auto-SAM can also run popular association mapping algorithms, in addition to five structured association mapping algorithms. Conclusions Auto-SAM is available through GenAMap, a front-end desktop visualization tool. GenAMap and Auto-SAM are implemented in JAVA; binaries for GenAMap can be downloaded from http://sailing.cs.cmu.edu/genamap. PMID:22471660
NASA Astrophysics Data System (ADS)
Zhang, Yingzi; Hou, Yulong; Zhang, Yanjun; Hu, Yanjun; Zhang, Liang; Gao, Xiaolong; Zhang, Huixin; Liu, Wenyi
2018-02-01
A simple and low-cost continuous liquid-level sensor based on two parallel plastic optical fibers (POFs) in a helical structure is presented. The change in the liquid level is determined by measuring the side-coupling power in the passive fiber. The side-coupling ratio is increased by just filling the gap between the two POFs with ultraviolet-curable optical cement, making the proposed sensor competitive. The experimental results show that the side-coupling power declines as the liquid level rises. The sensitivity and the measurement range are flexible and affected by the geometric parameters of the helical structure. A higher sensitivity of 0.0208 μW/mm is acquired for a smaller curvature radius of 5 mm, and the measurement range can be expanded to 120 mm by enlarging the screw pitch to 40 mm. In addition, the reversibility and temperature dependence are studied. The proposed sensor is a cost-effective solution offering the advantages of a simple fabrication process, good reversibility, and compensable temperature dependence.
Acceleration of Particles Near Earth's Bow Shock
NASA Astrophysics Data System (ADS)
Sandroos, A.
2012-12-01
Collisionless shock waves, for example, near planetary bodies or driven by coronal mass ejections, are a key source of energetic particles in the heliosphere. When the solar wind hits Earth's bow shock, some of the incident particles get reflected back towards the Sun and are accelerated in the process. Reflected ions are responsible for the creation of a turbulent foreshock in quasi-parallel regions of Earth's bow shock. We present first results of foreshock macroscopic structure and of particle distributions upstream of Earth's bow shock, obtained with a new 2.5-dimensional self-consistent diffusive shock acceleration model. In the model particles' pitch angle scattering rates are calculated from Alfvén wave power spectra using quasilinear theory. Wave power spectra in turn are modified by particles' energy changes due to the scatterings. The new model has been implemented on massively parallel simulation platform Corsair. We have used an earlier version of the model to study ion acceleration in a shock-shock interaction event (Hietala, Sandroos, and Vainio, 2012).
Menzies, Kevin
2014-08-13
The growth in simulation capability over the past 20 years has led to remarkable changes in the design process for gas turbines. The availability of relatively cheap computational power coupled to improvements in numerical methods and physical modelling in simulation codes have enabled the development of aircraft propulsion systems that are more powerful and yet more efficient than ever before. However, the design challenges are correspondingly greater, especially to reduce environmental impact. The simulation requirements to achieve a reduced environmental impact are described along with the implications of continued growth in available computational power. It is concluded that achieving the environmental goals will demand large-scale multi-disciplinary simulations requiring significantly increased computational power, to enable optimization of the airframe and propulsion system over the entire operational envelope. However even with massive parallelization, the limits imposed by communications latency will constrain the time required to achieve a solution, and therefore the position of such large-scale calculations in the industrial design process. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.
1999-04-01
A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.
Password Cracking Using Sony Playstations
NASA Astrophysics Data System (ADS)
Kleinhans, Hugo; Butts, Jonathan; Shenoi, Sujeet
Law enforcement agencies frequently encounter encrypted digital evidence for which the cryptographic keys are unknown or unavailable. Password cracking - whether it employs brute force or sophisticated cryptanalytic techniques - requires massive computational resources. This paper evaluates the benefits of using the Sony PlayStation 3 (PS3) to crack passwords. The PS3 offers massive computational power at relatively low cost. Moreover, multiple PS3 systems can be introduced easily to expand parallel processing when additional power is needed. This paper also describes a distributed framework designed to enable law enforcement agents to crack encrypted archives and applications in an efficient and cost-effective manner.
Noise Power Spectrum in PROPELLER MR Imaging.
Ichinoseki, Yuki; Nagasaka, Tatsuo; Miyamoto, Kota; Tamura, Hajime; Mori, Issei; Machida, Yoshio
2015-01-01
The noise power spectrum (NPS), an index for noise evaluation, represents the frequency characteristics of image noise. We measured the NPS in PROPELLER (Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction) magnetic resonance (MR) imaging, a nonuniform data sampling technique, as an initial study for practical MR image evaluation using the NPS. The 2-dimensional (2D) NPS reflected the k-space sampling density and showed agreement with the shape of the k-space trajectory as expected theoretically. Additionally, the 2D NPS allowed visualization of a part of the image reconstruction process, such as filtering and motion correction.
NASA Technical Reports Server (NTRS)
Juhasz, A. J.; Bloomfield, H. S.
1985-01-01
A combinatorial reliability approach is used to identify potential dynamic power conversion systems for space mission applications. A reliability and mass analysis is also performed, specifically for a 100 kWe nuclear Brayton power conversion system with parallel redundancy. Although this study is done for a reactor outlet temperature of 1100K, preliminary system mass estimates are also included for reactor outlet temperatures ranging up to 1500 K.
WaveJava: Wavelet-based network computing
NASA Astrophysics Data System (ADS)
Ma, Kun; Jiao, Licheng; Shi, Zhuoer
1997-04-01
Wavelet is a powerful theory, but its successful application still needs suitable programming tools. Java is a simple, object-oriented, distributed, interpreted, robust, secure, architecture-neutral, portable, high-performance, multi- threaded, dynamic language. This paper addresses the design and development of a cross-platform software environment for experimenting and applying wavelet theory. WaveJava, a wavelet class library designed by the object-orient programming, is developed to take advantage of the wavelets features, such as multi-resolution analysis and parallel processing in the networking computing. A new application architecture is designed for the net-wide distributed client-server environment. The data are transmitted with multi-resolution packets. At the distributed sites around the net, these data packets are done the matching or recognition processing in parallel. The results are fed back to determine the next operation. So, the more robust results can be arrived quickly. The WaveJava is easy to use and expand for special application. This paper gives a solution for the distributed fingerprint information processing system. It also fits for some other net-base multimedia information processing, such as network library, remote teaching and filmless picture archiving and communications.
Analysis of series resonant converter with series-parallel connection
NASA Astrophysics Data System (ADS)
Lin, Bor-Ren; Huang, Chien-Lan
2011-02-01
In this study, a parallel inductor-inductor-capacitor (LLC) resonant converter series-connected on the primary side and parallel-connected on the secondary side is presented for server power supply systems. Based on series resonant behaviour, the power metal-oxide-semiconductor field-effect transistors are turned on at zero voltage switching and the rectifier diodes are turned off at zero current switching. Thus, the switching losses on the power semiconductors are reduced. In the proposed converter, the primary windings of the two LLC converters are connected in series. Thus, the two converters have the same primary currents to ensure that they can supply the balance load current. On the output side, two LLC converters are connected in parallel to share the load current and to reduce the current stress on the secondary windings and the rectifier diodes. In this article, the principle of operation, steady-state analysis and design considerations of the proposed converter are provided and discussed. Experiments with a laboratory prototype with a 24 V/21 A output for server power supply were performed to verify the effectiveness of the proposed converter.
Superconducting coil system and methods of assembling the same
Rajput-Ghoshal, Renuka; Rochford, James H.; Ghoshal, Probir K.
2016-01-19
A superconducting magnet apparatus is provided. The superconducting magnet apparatus includes a power source configured to generate a current; a first switch coupled in parallel to the power source; a second switch coupled in series to the power source; a coil coupled in parallel to the first switch and the second switch; and a passive quench protection device coupled to the coil and configured to by-pass the current around the coil and to decouple the coil from the power source when the coil experiences a quench.
Graph Partitioning for Parallel Applications in Heterogeneous Grid Environments
NASA Technical Reports Server (NTRS)
Bisws, Rupak; Kumar, Shailendra; Das, Sajal K.; Biegel, Bryan (Technical Monitor)
2002-01-01
The problem of partitioning irregular graphs and meshes for parallel computations on homogeneous systems has been extensively studied. However, these partitioning schemes fail when the target system architecture exhibits heterogeneity in resource characteristics. With the emergence of technologies such as the Grid, it is imperative to study the partitioning problem taking into consideration the differing capabilities of such distributed heterogeneous systems. In our model, the heterogeneous system consists of processors with varying processing power and an underlying non-uniform communication network. We present in this paper a novel multilevel partitioning scheme for irregular graphs and meshes, that takes into account issues pertinent to Grid computing environments. Our partitioning algorithm, called MiniMax, generates and maps partitions onto a heterogeneous system with the objective of minimizing the maximum execution time of the parallel distributed application. For experimental performance study, we have considered both a realistic mesh problem from NASA as well as synthetic workloads. Simulation results demonstrate that MiniMax generates high quality partitions for various classes of applications targeted for parallel execution in a distributed heterogeneous environment.
Mobile Ultrasound Plane Wave Beamforming on iPhone or iPad using Metal- based GPU Processing
NASA Astrophysics Data System (ADS)
Hewener, Holger J.; Tretbar, Steffen H.
Mobile and cost effective ultrasound devices are being used in point of care scenarios or the drama room. To reduce the costs of such devices we already presented the possibilities of consumer devices like the Apple iPad for full signal processing of raw data for ultrasound image generation. Using technologies like plane wave imaging to generate a full image with only one excitation/reception event the acquisition times and power consumption of ultrasound imaging can be reduced for low power mobile devices based on consumer electronics realizing the transition from FPGA or ASIC based beamforming into more flexible software beamforming. The massive parallel beamforming processing can be done with the Apple framework "Metal" for advanced graphics and general purpose GPU processing for the iOS platform. We were able to integrate the beamforming reconstruction into our mobile ultrasound processing application with imaging rates up to 70 Hz on iPad Air 2 hardware.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images
Wang, Yangping; Wang, Song
2016-01-01
The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU). PMID:28053653
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erdmann, Thorsten; Albert, Philipp J.; Schwarz, Ulrich S.
2013-11-07
Non-processive molecular motors have to work together in ensembles in order to generate appreciable levels of force or movement. In skeletal muscle, for example, hundreds of myosin II molecules cooperate in thick filaments. In non-muscle cells, by contrast, small groups with few tens of non-muscle myosin II motors contribute to essential cellular processes such as transport, shape changes, or mechanosensing. Here we introduce a detailed and analytically tractable model for this important situation. Using a three-state crossbridge model for the myosin II motor cycle and exploiting the assumptions of fast power stroke kinetics and equal load sharing between motors inmore » equivalent states, we reduce the stochastic reaction network to a one-step master equation for the binding and unbinding dynamics (parallel cluster model) and derive the rules for ensemble movement. We find that for constant external load, ensemble dynamics is strongly shaped by the catch bond character of myosin II, which leads to an increase of the fraction of bound motors under load and thus to firm attachment even for small ensembles. This adaptation to load results in a concave force-velocity relation described by a Hill relation. For external load provided by a linear spring, myosin II ensembles dynamically adjust themselves towards an isometric state with constant average position and load. The dynamics of the ensembles is now determined mainly by the distribution of motors over the different kinds of bound states. For increasing stiffness of the external spring, there is a sharp transition beyond which myosin II can no longer perform the power stroke. Slow unbinding from the pre-power-stroke state protects the ensembles against detachment.« less
Efficient parallel implicit methods for rotary-wing aerodynamics calculations
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.
Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
3-D Electromagnetic field analysis of wireless power transfer system using K computer
NASA Astrophysics Data System (ADS)
Kawase, Yoshihiro; Yamaguchi, Tadashi; Murashita, Masaya; Tsukada, Shota; Ota, Tomohiro; Yamamoto, Takeshi
2018-05-01
We analyze the electromagnetic field of a wireless power transfer system using the 3-D parallel finite element method on K computer, which is a super computer in Japan. It is clarified that the electromagnetic field of the wireless power transfer system can be analyzed in a practical time using the parallel computation on K computer, moreover, the accuracy of the loss calculation becomes better as the mesh division of the shield becomes fine.
GPU real-time processing in NA62 trigger system
NASA Astrophysics Data System (ADS)
Ammendola, R.; Biagioni, A.; Chiozzi, S.; Cretaro, P.; Di Lorenzo, S.; Fantechi, R.; Fiorini, M.; Frezza, O.; Lamanna, G.; Lo Cicero, F.; Lonardo, A.; Martinelli, M.; Neri, I.; Paolucci, P. S.; Pastorelli, E.; Piandani, R.; Piccini, M.; Pontisso, L.; Rossetti, D.; Simula, F.; Sozzi, M.; Vicini, P.
2017-01-01
A commercial Graphics Processing Unit (GPU) is used to build a fast Level 0 (L0) trigger system tested parasitically with the TDAQ (Trigger and Data Acquisition systems) of the NA62 experiment at CERN. In particular, the parallel computing power of the GPU is exploited to perform real-time fitting in the Ring Imaging CHerenkov (RICH) detector. Direct GPU communication using a FPGA-based board has been used to reduce the data transmission latency. The performance of the system for multi-ring reconstrunction obtained during the NA62 physics run will be presented.
An Investigation into Spike-Based Neuromorphic Approaches for Artificial Olfactory Systems
Osseiran, Adam
2017-01-01
The implementation of neuromorphic methods has delivered promising results for vision and auditory sensors. These methods focus on mimicking the neuro-biological architecture to generate and process spike-based information with minimal power consumption. With increasing interest in developing low-power and robust chemical sensors, the application of neuromorphic engineering concepts for electronic noses has provided an impetus for research focusing on improving these instruments. While conventional e-noses apply computationally expensive and power-consuming data-processing strategies, neuromorphic olfactory sensors implement the biological olfaction principles found in humans and insects to simplify the handling of multivariate sensory data by generating and processing spike-based information. Over the last decade, research on neuromorphic olfaction has established the capability of these sensors to tackle problems that plague the current e-nose implementations such as drift, response time, portability, power consumption and size. This article brings together the key contributions in neuromorphic olfaction and identifies future research directions to develop near-real-time olfactory sensors that can be implemented for a range of applications such as biosecurity and environmental monitoring. Furthermore, we aim to expose the computational parallels between neuromorphic olfaction and gustation for future research focusing on the correlation of these senses. PMID:29125586
Pérez, Daniel; Gasulla, Ivana; Capmany, José; Fandiño, Javier S; Muñoz, Pascual; Alavi, Hossein
2016-09-05
We develop, analyze and apply a linearization technique based on dual parallel Mach-Zehnder modulator to self-beating microwave photonics systems. The approach enables broadband low-distortion transmission and reception at expense of a moderate electrical power penalty yielding a small optical power penalty (<1 dB).
Repetitive resonant railgun power supply
Honig, E.M.; Nunnally, W.C.
1985-06-19
A repetitive resonant railgun power supply provides energy for repetitively propelling projectiles from a pair of parallel rails. The supply comprises an energy storage capacitor, a storage inductor to form a resonant circuit with the energy storage capacitor and a magnetic switch to transfer energy between the resonant circuit and the pair of parallel rails for the propelling of projectiles.
Repetitive resonant railgun power supply
Honig, Emanuel M.; Nunnally, William C.
1988-01-01
A repetitive resonant railgun power supply provides energy for repetitively propelling projectiles from a pair of parallel rails. The supply comprises an energy storage capacitor, a storage inductor to form a resonant circuit with the energy storage capacitor and a magnetic switch to transfer energy between the resonant circuit and the pair of parallel rails for the propelling of projectiles.
Laser Safety Method For Duplex Open Loop Parallel Optical Link
Baumgartner, Steven John; Hedin, Daniel Scott; Paschal, Matthew James
2003-12-02
A method and apparatus are provided to ensure that laser optical power does not exceed a "safe" level in an open loop parallel optical link in the event that a fiber optic ribbon cable is broken or otherwise severed. A duplex parallel optical link includes a transmitter and receiver pair and a fiber optic ribbon that includes a designated number of channels that cannot be split. The duplex transceiver includes a corresponding transmitter and receiver that are physically attached to each other and cannot be detached therefrom, so as to ensure safe, laser optical power in the event that the fiber optic ribbon cable is broken or severed. Safe optical power is ensured by redundant current and voltage safety checks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Kohlmeyer, Axel; Plimpton, Steven J
The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more prevalent due to these advantages. In this paper, we present a continuation of previous work implementing algorithms for using accelerators into the LAMMPS molecular dynamics software for distributed memory parallel hybrid machines. In our previous work, we focused on acceleration for short-range models with anmore » approach intended to harness the processing power of both the accelerator and (multi-core) CPUs. To augment the existing implementations, we present an efficient implementation of long-range electrostatic force calculation for molecular dynamics. Specifically, we present an implementation of the particle-particle particle-mesh method based on the work by Harvey and De Fabritiis. We present benchmark results on the Keeneland InfiniBand GPU cluster. We provide a performance comparison of the same kernels compiled with both CUDA and OpenCL. We discuss limitations to parallel efficiency and future directions for improving performance on hybrid or heterogeneous computers.« less
Heat loads on poloidal and toroidal edges of castellated plasma-facing components in COMPASS
NASA Astrophysics Data System (ADS)
Dejarnac, R.; Corre, Y.; Vondracek, P.; Gaspar, J.; Gauthier, E.; Gunn, J. P.; Komm, M.; Gardarein, J.-L.; Horacek, J.; Hron, M.; Matejicek, J.; Pitts, R. A.; Panek, R.
2018-06-01
Dedicated experiments have been performed in the COMPASS tokamak to thoroughly study the power deposition processes occurring on poloidal and toroidal edges of castellated plasma-facing components in tokamaks during steady-state L-mode conditions. Surface temperatures measured by a high resolution infra-red camera are compared with reconstructed synthetic data from a 2D thermal model using heat flux profiles derived from both the optical approximation and 2D particle-in-cell (PIC) simulations. In the case of poloidal leading edges, when the contribution from local radiation is taken into account, the parallel heat flux deduced from unperturbed, upstream measurements is fully consistent with the observed temperature increase at the leading edges of various heights, respecting power balance assuming simple projection of the parallel flux density. Smoothing of the heat flux deposition profile due to finite ion Larmor radius predicted by the PIC simulations is found to be weak and the power deposition on misaligned poloidal edges is better described by the optical approximation. This is consistent with an electron-dominated regime associated with a non-ambipolar parallel current flow. In the case of toroidal gap edges, the different contributions of the total incoming flux along the gap have been observed experimentally for the first time. They confirm the results of recent numerical studies performed for ITER showing that in specific cases the heat deposition does not necessarily follow the optical approximation. Indeed, ions can spiral onto the magnetically shadowed toroidal edge. Particle-in-cell simulations emphasize again the role played by local non-ambipolarity in the deposition pattern.
Lossless data compression for improving the performance of a GPU-based beamformer.
Lok, U-Wai; Fan, Gang-Wei; Li, Pai-Chi
2015-04-01
The powerful parallel computation ability of a graphics processing unit (GPU) makes it feasible to perform dynamic receive beamforming However, a real time GPU-based beamformer requires high data rate to transfer radio-frequency (RF) data from hardware to software memory, as well as from central processing unit (CPU) to GPU memory. There are data compression methods (e.g. Joint Photographic Experts Group (JPEG)) available for the hardware front end to reduce data size, alleviating the data transfer requirement of the hardware interface. Nevertheless, the required decoding time may even be larger than the transmission time of its original data, in turn degrading the overall performance of the GPU-based beamformer. This article proposes and implements a lossless compression-decompression algorithm, which enables in parallel compression and decompression of data. By this means, the data transfer requirement of hardware interface and the transmission time of CPU to GPU data transfers are reduced, without sacrificing image quality. In simulation results, the compression ratio reached around 1.7. The encoder design of our lossless compression approach requires low hardware resources and reasonable latency in a field programmable gate array. In addition, the transmission time of transferring data from CPU to GPU with the parallel decoding process improved by threefold, as compared with transferring original uncompressed data. These results show that our proposed lossless compression plus parallel decoder approach not only mitigate the transmission bandwidth requirement to transfer data from hardware front end to software system but also reduce the transmission time for CPU to GPU data transfer. © The Author(s) 2014.
Switchable Polymer Based Thin Film Coils as a Power Module for Wireless Neural Interfaces.
Kim, S; Zoschke, K; Klein, M; Black, D; Buschick, K; Toepper, M; Tathireddy, P; Harrison, R; Solzbacher, F
2007-05-01
Reliable chronic operation of implantable medical devices such as the Utah Electrode Array (UEA) for neural interface requires elimination of transcutaneous wire connections for signal processing, powering and communication of the device. A wireless power source that allows integration with the UEA is therefore necessary. While (rechargeable) micro batteries as well as biological micro fuel cells are yet far from meeting the power density and lifetime requirements of an implantable neural interface device, inductive coupling between two coils is a promising approach to power such a device with highly restricted dimensions. The power receiving coils presented in this paper were designed to maximize the inductance and quality factor of the coils and microfabricated using polymer based thin film technologies. A flexible configuration of stacked thin film coils allows parallel and serial switching, thereby allowing to tune the coil's resonance frequency. The electrical properties of the fabricated coils were characterized and their power transmission performance was investigated in laboratory condition.
Partitioning in parallel processing of production systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oflazer, K.
1987-01-01
This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
Parallel computation of multigroup reactivity coefficient using iterative method
NASA Astrophysics Data System (ADS)
Susmikanti, Mike; Dewayatna, Winter
2013-09-01
One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Bailey, David (Technical Monitor)
1997-01-01
While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX
NASA Astrophysics Data System (ADS)
Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; Wilcox, R. S.; Anderson, D. T.
2018-05-01
The radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C+6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Indications are that the radial electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.
Load Balancing Strategies for Multi-Block Overset Grid Applications
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biswas, Rupak; Lopez-Benitez, Noe; Biegel, Bryan (Technical Monitor)
2002-01-01
The multi-block overset grid method is a powerful technique for high-fidelity computational fluid dynamics (CFD) simulations about complex aerospace configurations. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping structured grids that periodically update and exchange boundary information through interpolation. For efficient high performance computations of large-scale realistic applications using this methodology, the individual grids must be properly partitioned among the parallel processors. Overall performance, therefore, largely depends on the quality of load balancing. In this paper, we present three different load balancing strategies far overset grids and analyze their effects on the parallel efficiency of a Navier-Stokes CFD application running on an SGI Origin2000 machine.
Co-Simulation for Advanced Process Design and Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stephen E. Zitney
2009-01-01
Meeting the increasing demand for clean, affordable, and secure energy is arguably the most important challenge facing the world today. Fossil fuels can play a central role in a portfolio of carbon-neutral energy options provided CO{sub 2} emissions can be dramatically reduced by capturing CO{sub 2} and storing it safely and effectively. Fossil energy industry faces the challenge of meeting aggressive design goals for next-generation power plants with CCS. Process designs will involve large, highly-integrated, and multipurpose systems with advanced equipment items with complex geometries and multiphysics. APECS is enabling software to facilitate effective integration, solution, and analysis of high-fidelitymore » process/equipment (CFD) co-simulations. APECS helps to optimize fluid flow and related phenomena that impact overall power plant performance. APECS offers many advanced capabilities including ROMs, design optimization, parallel execution, stochastic analysis, and virtual plant co-simulations. NETL and its collaborative R&D partners are using APECS to reduce the time, cost, and technical risk of developing high-efficiency, zero-emission power plants with CCS.« less
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation
NASA Astrophysics Data System (ADS)
Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.
2011-03-01
A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
Power-Aware Compiler Controllable Chip Multiprocessor
NASA Astrophysics Data System (ADS)
Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori
A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.
Advances in simulation of wave interactions with extended MHD phenomena
NASA Astrophysics Data System (ADS)
Batchelor, D.; Abla, G.; D'Azevedo, E.; Bateman, G.; Bernholdt, D. E.; Berry, L.; Bonoli, P.; Bramley, R.; Breslau, J.; Chance, M.; Chen, J.; Choi, M.; Elwasif, W.; Foley, S.; Fu, G.; Harvey, R.; Jaeger, E.; Jardin, S.; Jenkins, T.; Keyes, D.; Klasky, S.; Kruger, S.; Ku, L.; Lynch, V.; McCune, D.; Ramos, J.; Schissel, D.; Schnack, D.; Wright, J.
2009-07-01
The Integrated Plasma Simulator (IPS) provides a framework within which some of the most advanced, massively-parallel fusion modeling codes can be interoperated to provide a detailed picture of the multi-physics processes involved in fusion experiments. The presentation will cover four topics: 1) recent improvements to the IPS, 2) application of the IPS for very high resolution simulations of ITER scenarios, 3) studies of resistive and ideal MHD stability in tokamk discharges using IPS facilities, and 4) the application of RF power in the electron cyclotron range of frequencies to control slowly growing MHD modes in tokamaks and initial evaluations of optimized location for RF power deposition.
NASA Astrophysics Data System (ADS)
Song, Y.; Lysak, R. L.
2015-12-01
Parallel E-fields play a crucial role for the acceleration of charged particles, creating discrete aurorae. However, once the parallel electric fields are produced, they will disappear right away, unless the electric fields can be continuously generated and sustained for a fairly long time. Thus, the crucial question in auroral physics is how to generate such a powerful and self-sustained parallel electric fields which can effectively accelerate charge particles to high energy during a fairly long time. We propose that nonlinear interaction of incident and reflected Alfven wave packets in inhomogeneous auroral acceleration region can produce quasi-stationary non-propagating electromagnetic plasma structures, such as Alfvenic double layers (DLs) and Charge Holes. Such Alfvenic quasi-static structures often constitute powerful high energy particle accelerators. The Alfvenic DL consists of localized self-sustained powerful electrostatic electric fields nested in a low density cavity and surrounded by enhanced magnetic and mechanical stresses. The enhanced magnetic and velocity fields carrying the free energy serve as a local dynamo, which continuously create the electrostatic parallel electric field for a fairly long time. The generated parallel electric fields will deepen the seed low density cavity, which then further quickly boosts the stronger parallel electric fields creating both Alfvenic and quasi-static discrete aurorae. The parallel electrostatic electric field can also cause ion outflow, perpendicular ion acceleration and heating, and may excite Auroral Kilometric Radiation.
A METHOD FOR IN-SITU CHARACTERIZATION OF RF HEATING IN PARALLEL TRANSMIT MRI
Alon, Leeor; Deniz, Cem Murat; Brown, Ryan; Sodickson, Daniel K.; Zhu, Yudong
2012-01-01
In ultra high field magnetic resonance imaging, parallel radio-frequency (RF) transmission presents both opportunities and challenges for specific absorption rate (SAR) management. On one hand, parallel transmission provides flexibility in tailoring electric fields in the body while facilitating magnetization profile control. On the other hand, it increases the complexity of energy deposition as well as possibly exacerbating local SAR by improper design or delivery of RF pulses. This study shows that the information needed to characterize RF heating in parallel transmission is contained within a local power correlation matrix. Building upon a calibration scheme involving a finite number of magnetic resonance thermometry measurements, the present work establishes a way of estimating the local power correlation matrix. Determination of this matrix allows prediction of temperature change for an arbitrary parallel transmit RF pulse. In the case of a three transmit coil MR experiment in a phantom, determination and validation of the power correlation matrix was conducted in less than 200 minutes with induced temperature changes of <4 degrees C. Further optimization and adaptation are possible, and simulations evaluating potential feasibility for in vivo use are presented. The method allows general characteristics indicative of RF coil/pulse safety determined in situ. PMID:22714806
OpenMP parallelization of a gridded SWAT (SWATG)
NASA Astrophysics Data System (ADS)
Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin
2017-12-01
Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Instrumentation, performance visualization, and debugging tools for multiprocessors
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.
1991-01-01
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Estimation of the energy loss at the blades in rowing: common assumptions revisited.
Hofmijster, Mathijs; De Koning, Jos; Van Soest, A J
2010-08-01
In rowing, power is inevitably lost as kinetic energy is imparted to the water during push-off with the blades. Power loss is estimated from reconstructed blade kinetics and kinematics. Traditionally, it is assumed that the oar is completely rigid and that force acts strictly perpendicular to the blade. The aim of the present study was to evaluate how reconstructed blade kinematics, kinetics, and average power loss are affected by these assumptions. A calibration experiment with instrumented oars and oarlocks was performed to establish relations between measured signals and oar deformation and blade force. Next, an on-water experiment was performed with a single female world-class rower rowing at constant racing pace in an instrumented scull. Blade kinematics, kinetics, and power loss under different assumptions (rigid versus deformable oars; absence or presence of a blade force component parallel to the oar) were reconstructed. Estimated power losses at the blades are 18% higher when parallel blade force is incorporated. Incorporating oar deformation affects reconstructed blade kinematics and instantaneous power loss, but has no effect on estimation of power losses at the blades. Assumptions on oar deformation and blade force direction have implications for the reconstructed blade kinetics and kinematics. Neglecting parallel blade forces leads to a substantial underestimation of power losses at the blades.
NASA Astrophysics Data System (ADS)
Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.
2006-12-01
In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the number of files and the elapsed time, parallel and distributed processing shorten the elapsed time to 1/5 than sequential processing. On the other hand, sequential processing times were shortened in another experiment, whose file size is smaller than 100KB. In this case, the elapsed time to scan one file is within one second. It implies that disk swap took place in case of parallel processing by each node. We note that the operation became unstable when the number of the files exceeded 1000. To overcome the problem (iii), we developed an original data class. This class supports our reading of data files with various data formats since it converts them into an original data format since it defines schemata for every type of data and encapsulates the structure of data files. In addition, since this class provides a function of time re-sampling, users can easily convert multiple data (array) with different time resolution into the same time resolution array. Finally, using the Gfarm, we achieved a high performance environment for large-scale statistical data analyses. It should be noted that the present method is effective only when one data file size is large enough. At present, we are restructuring the new Gfarm environment with 8 nodes: CPU is Athlon 64 x2 Dual Core 2GHz, 2GB memory and 1.2TB disk (using RAID0) for each node. Our original class is to be implemented on the new Gfarm environment. In the present talk, we show the latest results with applying the present system for data analyses with huge number of satellite observation data files.
A Fast Radio Burst Search Method for VLBI Observation
NASA Astrophysics Data System (ADS)
Liu, Lei; Tong, Fengxian; Zheng, Weimin; Zhang, Juan; Tong, Li
2018-02-01
We introduce the cross-spectrum-based fast radio burst (FRB) search method for Very Long Baseline Interferometer (VLBI) observation. This method optimizes the fringe fitting scheme in geodetic VLBI data post-processing, which fully utilizes the cross-spectrum fringe phase information and therefore maximizes the power of single-pulse signals. Working with cross-spectrum greatly reduces the effect of radio frequency interference compared with using auto-power spectrum. Single-pulse detection confidence increases by cross-identifying detections from multiple baselines. By combining the power of multiple baselines, we may improve the detection sensitivity. Our method is similar to that of coherent beam forming, but without the computational expense to form a great number of beams to cover the whole field of view of our telescopes. The data processing pipeline designed for this method is easy to implement and parallelize, which can be deployed in various kinds of VLBI observations. In particular, we point out that VGOS observations are very suitable for FRB search.
Li, Bingyi; Chen, Liang; Yu, Wenyue; Xie, Yizhuang; Bian, Mingming; Zhang, Qingjun; Pang, Long
2018-01-01
With the development of satellite load technology and very large-scale integrated (VLSI) circuit technology, on-board real-time synthetic aperture radar (SAR) imaging systems have facilitated rapid response to disasters. A key goal of the on-board SAR imaging system design is to achieve high real-time processing performance under severe size, weight, and power consumption constraints. This paper presents a multi-node prototype system for real-time SAR imaging processing. We decompose the commonly used chirp scaling (CS) SAR imaging algorithm into two parts according to the computing features. The linearization and logic-memory optimum allocation methods are adopted to realize the nonlinear part in a reconfigurable structure, and the two-part bandwidth balance method is used to realize the linear part. Thus, float-point SAR imaging processing can be integrated into a single Field Programmable Gate Array (FPGA) chip instead of relying on distributed technologies. A single-processing node requires 10.6 s and consumes 17 W to focus on 25-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. The design methodology of the multi-FPGA parallel accelerating system under the real-time principle is introduced. As a proof of concept, a prototype with four processing nodes and one master node is implemented using a Xilinx xc6vlx315t FPGA. The weight and volume of one single machine are 10 kg and 32 cm × 24 cm × 20 cm, respectively, and the power consumption is under 100 W. The real-time performance of the proposed design is demonstrated on Chinese Gaofen-3 stripmap continuous imaging. PMID:29495637
Free-electron laser simulations on the MPP
NASA Technical Reports Server (NTRS)
Vonlaven, Scott A.; Liebrock, Lorie M.
1987-01-01
Free electron lasers (FELs) are of interest because they provide high power, high efficiency, and broad tunability. FEL simulations can make efficient use of computers of the Massively Parallel Processor (MPP) class because most of the processing consists of applying a simple equation to a set of identical particles. A test version of the KMS Fusion FEL simulation, which resides mainly in the MPPs host computer and only partially in the MPP, has run successfully.
Rational calculation accuracy in acousto-optical matrix-vector processor
NASA Astrophysics Data System (ADS)
Oparin, V. V.; Tigin, Dmitry V.
1994-01-01
The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
The Modeling, Simulation and Comparison of Interconnection Networks for Parallel Processing.
1987-12-01
performs better at a lower hardware cost than do the single stage cube and mesh networks. As a result, the designer of a paralll pro- cessing system is...attempted, and in most cases succeeded, in designing and implementing faster. more powerful systems. Due to design innovations and technological advances...largely to the computational complexity of the algorithms executed. In the von Neumann machine, instructions must be executed in a sequential manner. Design
Performance of parallel computation using CUDA for solving the one-dimensional elasticity equations
NASA Astrophysics Data System (ADS)
Darmawan, J. B. B.; Mungkasi, S.
2017-01-01
In this paper, we investigate the performance of parallel computation in solving the one-dimensional elasticity equations. Elasticity equations are usually implemented in engineering science. Solving these equations fast and efficiently is desired. Therefore, we propose the use of parallel computation. Our parallel computation uses CUDA of the NVIDIA. Our research results show that parallel computation using CUDA has a great advantage and is powerful when the computation is of large scale.
Under-sampling in a Multiple-Channel Laser Vibrometry System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corey, Jordan
2007-03-01
Laser vibrometry is a technique used to detect vibrations on objects using the interference of coherent light with itself. Most vibrometry systems process only one target location at a time, but processing multiple locations simultaneously provides improved detection capabilities. Traditional laser vibrometry systems employ oversampling to sample the incoming modulated-light signal, however as the number of channels increases in these systems, certain issues arise such a higher computational cost, excessive heat, increased power requirements, and increased component cost. This thesis describes a novel approach to laser vibrometry that utilizes undersampling to control the undesirable issues associated with over-sampled systems. Undersamplingmore » allows for significantly less samples to represent the modulated-light signals, which offers several advantages in the overall system design. These advantages include an improvement in thermal efficiency, lower processing requirements, and a higher immunity to the relative intensity noise inherent in laser vibrometry applications. A unique feature of this implementation is the use of a parallel architecture to increase the overall system throughput. This parallelism is realized using a hierarchical multi-channel architecture based on off-the-shelf programmable logic devices (PLDs).« less
NASA Astrophysics Data System (ADS)
Erberich, Stephan G.; Hoppe, Martin; Jansen, Christian; Schmidt, Thomas; Thron, Armin; Oberschelp, Walter
2001-08-01
In the last few years more and more University Hospitals as well as private hospitals changed to digital information systems for patient record, diagnostic files and digital images. Not only that patient management becomes easier, it is also very remarkable how clinical research can profit from Picture Archiving and Communication Systems (PACS) and diagnostic databases, especially from image databases. Since images are available on the finger tip, difficulties arise when image data needs to be processed, e.g. segmented, classified or co-registered, which usually demands a lot computational power. Today's clinical environment does support PACS very well, but real image processing is still under-developed. The purpose of this paper is to introduce a parallel cluster of standard distributed systems and its software components and how such a system can be integrated into a hospital environment. To demonstrate the cluster technique we present our clinical experience with the crucial but cost-intensive motion correction of clinical routine and research functional MRI (fMRI) data, as it is processed in our Lab on a daily basis.
NASA Astrophysics Data System (ADS)
Zhang, Jilin; Sha, Chaoqun; Wu, Yusen; Wan, Jian; Zhou, Li; Ren, Yongjian; Si, Huayou; Yin, Yuyu; Jing, Ya
2017-02-01
GPU not only is used in the field of graphic technology but also has been widely used in areas needing a large number of numerical calculations. In the energy industry, because of low carbon, high energy density, high duration and other characteristics, the development of nuclear energy cannot easily be replaced by other energy sources. Management of core fuel is one of the major areas of concern in a nuclear power plant, and it is directly related to the economic benefits and cost of nuclear power. The large-scale reactor core expansion equation is large and complicated, so the calculation of the diffusion equation is crucial in the core fuel management process. In this paper, we use CUDA programming technology on a GPU cluster to run the LU-SGS parallel iterative calculation against the background of the diffusion equation of the reactor. We divide one-dimensional and two-dimensional mesh into a plurality of domains, with each domain evenly distributed on the GPU blocks. A parallel collision scheme is put forward that defines the virtual boundary of the grid exchange information and data transmission by non-stop collision. Compared with the serial program, the experiment shows that GPU greatly improves the efficiency of program execution and verifies that GPU is playing a much more important role in the field of numerical calculations.
Burst-mode optical label processor with ultralow power consumption.
Ibrahim, Salah; Nakahara, Tatsushi; Ishikawa, Hiroshi; Takahashi, Ryo
2016-04-04
A novel label processor subsystem for 100-Gbps (25-Gbps × 4λs) burst-mode optical packets is developed, in which a highly energy-efficient method is pursued for extracting and interfacing the ultrafast packet-label to a CMOS-based processor where label recognition takes place. The method involves performing serial-to-parallel conversion for the label bits on a bit-by-bit basis by using an optoelectronic converter that is operated with a set of optical triggers generated in a burst-mode manner upon packet arrival. Here we present three key achievements that enabled a significant reduction in the total power consumption and latency of the whole subsystem; 1) based on a novel operation mechanism for providing amplification with bit-level selectivity, an optical trigger pulse generator, that consumes power for a very short duration upon packet arrival, is proposed and experimentally demonstrated, 2) the energy of optical triggers needed by the optoelectronic serial-to-parallel converter is reduced by utilizing a negative-polarity signal while employing an enhanced conversion scheme entitled the discharge-or-hold scheme, 3) the necessary optical trigger energy is further cut down by half by coupling the triggers through the chip's backside, whereas a novel lens-free packaging method is developed to enable a low-cost alignment process that works with simple visual observation.
ERIC Educational Resources Information Center
Qvit, Nir; Barda, Yaniv; Gilon, Chaim; Shalev, Deborah E.
2007-01-01
This laboratory experiment provides a unique opportunity for students to synthesize three analogues of aspartame, a commonly used artificial sweetener. The students are introduced to the powerful and useful method of parallel synthesis while synthesizing three dipeptides in parallel using solid-phase peptide synthesis (SPPS) and simultaneous…
Numerical modelling of series-parallel cooling systems in power plant
NASA Astrophysics Data System (ADS)
Regucki, Paweł; Lewkowicz, Marek; Kucięba, Małgorzata
2017-11-01
The paper presents a mathematical model allowing one to study series-parallel hydraulic systems like, e.g., the cooling system of a power boiler's auxiliary devices or a closed cooling system including condensers and cooling towers. The analytical approach is based on a set of non-linear algebraic equations solved using numerical techniques. As a result of the iterative process, a set of volumetric flow rates of water through all the branches of the investigated hydraulic system is obtained. The calculations indicate the influence of changes in the pipeline's geometrical parameters on the total cooling water flow rate in the analysed installation. Such an approach makes it possible to analyse different variants of the modernization of the studied systems, as well as allowing for the indication of its critical elements. Basing on these results, an investor can choose the optimal variant of the reconstruction of the installation from the economic point of view. As examples of such a calculation, two hydraulic installations are described. One is a boiler auxiliary cooling installation including two screw ash coolers. The other is a closed cooling system consisting of cooling towers and condensers.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elbert, Stephen T.; Kalsi, Karanjit; Vlachopoulou, Maria
Financial Transmission Rights (FTRs) help power market participants reduce price risks associated with transmission congestion. FTRs are issued based on a process of solving a constrained optimization problem with the objective to maximize the FTR social welfare under power flow security constraints. Security constraints for different FTR categories (monthly, seasonal or annual) are usually coupled and the number of constraints increases exponentially with the number of categories. Commercial software for FTR calculation can only provide limited categories of FTRs due to the inherent computational challenges mentioned above. In this paper, a novel non-linear dynamical system (NDS) approach is proposed tomore » solve the optimization problem. The new formulation and performance of the NDS solver is benchmarked against widely used linear programming (LP) solvers like CPLEX™ and tested on large-scale systems using data from the Western Electricity Coordinating Council (WECC). The NDS is demonstrated to outperform the widely used CPLEX algorithms while exhibiting superior scalability. Furthermore, the NDS based solver can be easily parallelized which results in significant computational improvement.« less
JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX
NASA Astrophysics Data System (ADS)
Omidian, Hossein; Lemieux, Guy G. F.
2018-04-01
Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.
Deniz, Cem M; Vaidya, Manushka V; Sodickson, Daniel K; Lattanzi, Riccardo
2016-01-01
We investigated global specific absorption rate (SAR) and radiofrequency (RF) power requirements in parallel transmission as the distance between the transmit coils and the sample was increased. We calculated ultimate intrinsic SAR (UISAR), which depends on object geometry and electrical properties but not on coil design, and we used it as the reference to compare the performance of various transmit arrays. We investigated the case of fixing coil size and increasing the number of coils while moving the array away from the sample, as well as the case of fixing coil number and scaling coil dimensions. We also investigated RF power requirements as a function of lift-off, and tracked local SAR distributions associated with global SAR optima. In all cases, the target excitation profile was achieved and global SAR (as well as associated maximum local SAR) decreased with lift-off, approaching UISAR, which was constant for all lift-offs. We observed a lift-off value that optimizes the balance between global SAR and power losses in coil conductors. We showed that, using parallel transmission, global SAR can decrease at ultra high fields for finite arrays with a sufficient number of transmit elements. For parallel transmission, the distance between coils and object can be optimized to reduce SAR and minimize RF power requirements associated with homogeneous excitation. © 2015 Wiley Periodicals, Inc.
Watkins, C Edward
2012-09-01
In a way not done before, Tracey, Bludworth, and Glidden-Tracey ("Are there parallel processes in psychotherapy supervision: An empirical examination," Psychotherapy, 2011, advance online publication, doi.10.1037/a0026246) have shown us that parallel process in psychotherapy supervision can indeed be rigorously and meaningfully researched, and their groundbreaking investigation provides a nice prototype for future supervision studies to emulate. In what follows, I offer a brief complementary comment to Tracey et al., addressing one matter that seems to be a potentially important conceptual and empirical parallel process consideration: When is a parallel just a parallel? PsycINFO Database Record (c) 2012 APA, all rights reserved.
Seeing the forest for the trees: Networked workstations as a parallel processing computer
NASA Technical Reports Server (NTRS)
Breen, J. O.; Meleedy, D. M.
1992-01-01
Unlike traditional 'serial' processing computers in which one central processing unit performs one instruction at a time, parallel processing computers contain several processing units, thereby, performing several instructions at once. Many of today's fastest supercomputers achieve their speed by employing thousands of processing elements working in parallel. Few institutions can afford these state-of-the-art parallel processors, but many already have the makings of a modest parallel processing system. Workstations on existing high-speed networks can be harnessed as nodes in a parallel processing environment, bringing the benefits of parallel processing to many. While such a system can not rival the industry's latest machines, many common tasks can be accelerated greatly by spreading the processing burden and exploiting idle network resources. We study several aspects of this approach, from algorithms to select nodes to speed gains in specific tasks. With ever-increasing volumes of astronomical data, it becomes all the more necessary to utilize our computing resources fully.
Parallel Processing at the High School Level.
ERIC Educational Resources Information Center
Sheary, Kathryn Anne
This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…
180 MW/180 KW pulse modulator for S-band klystron of LUE-200 linac of IREN installation of JINR
NASA Astrophysics Data System (ADS)
Su, Kim Dong; Sumbaev, A. P.; Shvetsov, V. N.
2014-09-01
The offer on working out of the pulse modulator with 180 MW pulse power and 180 kW average power for pulse S-band klystrons of LUE-200 linac of IREN installation at the Laboratory of neutron physics (FLNP) at JINR is formulated. Main requirements, key parameters and element base of the modulator are presented. The variant of the basic scheme on the basis of 14 (or 11) stage 2 parallel PFN with the thyratron switchboard (TGI2-10K/50) and six parallel high voltage power supplies (CCPS Power Supply) is considered.
NASA Astrophysics Data System (ADS)
Meng, Luming; Sheong, Fu Kit; Zeng, Xiangze; Zhu, Lizhe; Huang, Xuhui
2017-07-01
Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.
Schmidt, James R; De Houwer, Jan; Rothermund, Klaus
2016-12-01
The current paper presents an extension of the Parallel Episodic Processing model. The model is developed for simulating behaviour in performance (i.e., speeded response time) tasks and learns to anticipate both how and when to respond based on retrieval of memories of previous trials. With one fixed parameter set, the model is shown to successfully simulate a wide range of different findings. These include: practice curves in the Stroop paradigm, contingency learning effects, learning acquisition curves, stimulus-response binding effects, mixing costs, and various findings from the attentional control domain. The results demonstrate several important points. First, the same retrieval mechanism parsimoniously explains stimulus-response binding, contingency learning, and practice effects. Second, as performance improves with practice, any effects will shrink with it. Third, a model of simple learning processes is sufficient to explain phenomena that are typically (but perhaps incorrectly) interpreted in terms of higher-order control processes. More generally, we argue that computational models with a fixed parameter set and wider breadth should be preferred over those that are restricted to a narrow set of phenomena. Copyright © 2016 Elsevier Inc. All rights reserved.
SiC Multi-Chip Power Modules as Power-System Building Blocks
NASA Technical Reports Server (NTRS)
Lostetter, Alexander; Franks, Steven
2007-01-01
The term "SiC MCPMs" (wherein "MCPM" signifies "multi-chip power module") denotes electronic power-supply modules containing multiple silicon carbide power devices and silicon-on-insulator (SOI) control integrated-circuit chips. SiC MCPMs are being developed as building blocks of advanced expandable, reconfigurable, fault-tolerant power-supply systems. Exploiting the ability of SiC semiconductor devices to operate at temperatures, breakdown voltages, and current densities significantly greater than those of conventional Si devices, the designs of SiC MCPMs and of systems comprising multiple SiC MCPMs are expected to afford a greater degree of miniaturization through stacking of modules with reduced requirements for heat sinking. Moreover, the higher-temperature capabilities of SiC MCPMs could enable operation in environments hotter than Si-based power systems can withstand. The stacked SiC MCPMs in a given system can be electrically connected in series, parallel, or a series/parallel combination to increase the overall power-handling capability of the system. In addition to power connections, the modules have communication connections. The SOI controllers in the modules communicate with each other as nodes of a decentralized control network, in which no single controller exerts overall command of the system. Control functions effected via the network include synchronization of switching of power devices and rapid reconfiguration of power connections to enable the power system to continue to supply power to a load in the event of failure of one of the modules. In addition to serving as building blocks of reliable power-supply systems, SiC MCPMs could be augmented with external control circuitry to make them perform additional power-handling functions as needed for specific applications: typical functions could include regulating voltages, storing energy, and driving motors. Because identical SiC MCPM building blocks could be utilized in a variety of ways, the cost and difficulty of designing new, highly reliable power systems would be reduced considerably. Several prototype DC-to-DC power-converter modules containing SiC power-switching devices were designed and built to demonstrate the feasibility of the SiC MCPM concept. In anticipation of a future need for operation at high temperature, the circuitry in the modules includes high-temperature inductors and capacitors. These modules were designed to be stacked to construct a system of four modules electrically connected in series and/or parallel. The packaging of the modules is designed to satisfy requirements for series and parallel interconnection among modules, high power density, high thermal efficiency, small size, and light weight. Each module includes four output power connectors two for serial and two for parallel output power connections among the modules. Each module also includes two signal connectors, electrically isolated from the power connectors, that afford four zones for signal interconnections among the SOI controllers. Finally, each module includes two input power connectors, through which it receives power from an in-line power bus. This design feature is included in anticipation of a custom-designed power bus incorporating sockets compatible with snap-on type connectors to enable rapid replacement of failed modules.
Efficient parallel simulation of CO2 geologic sequestration insaline aquifers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Doughty, Christine; Wu, Yu-Shu
2007-01-01
An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Brian B; Purba, Victor; Jafarpour, Saber
Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the pointmore » of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.« less
Low inductance power electronics assembly
Herron, Nicholas Hayden; Mann, Brooks S.; Korich, Mark D.; Chou, Cindy; Tang, David; Carlson, Douglas S.; Barry, Alan L.
2012-10-02
A power electronics assembly is provided. A first support member includes a first plurality of conductors. A first plurality of power switching devices are coupled to the first support member. A first capacitor is coupled to the first support member. A second support member includes a second plurality of conductors. A second plurality of power switching devices are coupled to the second support member. A second capacitor is coupled to the second support member. The first and second pluralities of conductors, the first and second pluralities of power switching devices, and the first and second capacitors are electrically connected such that the first plurality of power switching devices is connected in parallel with the first capacitor and the second capacitor and the second plurality of power switching devices is connected in parallel with the second capacitor and the first capacitor.
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Photonics for aerospace sensors
NASA Astrophysics Data System (ADS)
Pellegrino, John; Adler, Eric D.; Filipov, Andree N.; Harrison, Lorna J.; van der Gracht, Joseph; Smith, Dale J.; Tayag, Tristan J.; Viveiros, Edward A.
1992-11-01
The maturation in the state-of-the-art of optical components is enabling increased applications for the technology. Most notable is the ever-expanding market for fiber optic data and communications links, familiar in both commercial and military markets. The inherent properties of optics and photonics, however, have suggested that components and processors may be designed that offer advantages over more commonly considered digital approaches for a variety of airborne sensor and signal processing applications. Various academic, industrial, and governmental research groups have been actively investigating and exploiting these properties of high bandwidth, large degree of parallelism in computation (e.g., processing in parallel over a two-dimensional field), and interconnectivity, and have succeeded in advancing the technology to the stage of systems demonstration. Such advantages as computational throughput and low operating power consumption are highly attractive for many computationally intensive problems. This review covers the key devices necessary for optical signal and image processors, some of the system application demonstration programs currently in progress, and active research directions for the implementation of next-generation architectures.
NASA Astrophysics Data System (ADS)
Zulkifli, S. A.; Syaifuddin Mohd, M.; Maharun, M.; Bakar, N. S. A.; Idris, S.; Samsudin, S. H.; Firmansyah; Adz, J. J.; Misbahulmunir, M.; Abidin, E. Z. Z.; Syafiq Mohd, M.; Saad, N.; Aziz, A. R. A.
2015-12-01
One configuration of the hybrid electric vehicle (HEV) is the split-axle parallel hybrid, in which an internal combustion engine (ICE) and an electric motor provide propulsion power to different axles. A particular sub-type of the split-parallel hybrid does not have the electric motor installed on board the vehicle; instead, two electric motors are placed in the hubs of the non-driven wheels, called ‘hub motor’ or ‘in-wheel motor’ (IWM). Since propulsion power from the ICE and IWM is coupled through the vehicle itself, its wheels and the road on which it moves, this particular configuration is termed ‘through-the-road’ (TTR) hybrid. TTR configuration enables existing ICE-powered vehicles to be retrofitted into an HEV with minimal physical modification. This work describes design of a retrofit- conversion TTR-IWM hybrid vehicle - its sub-systems and development work. Operating modes and power flow of the TTR hybrid, its torque coupling and resultant traction profiles are initially discussed.
Accelerating Wright–Fisher Forward Simulations on the Graphics Processing Unit
Lawrie, David S.
2017-01-01
Forward Wright–Fisher simulations are powerful in their ability to model complex demography and selection scenarios, but suffer from slow execution on the Central Processor Unit (CPU), thus limiting their usefulness. However, the single-locus Wright–Fisher forward algorithm is exceedingly parallelizable, with many steps that are so-called “embarrassingly parallel,” consisting of a vast number of individual computations that are all independent of each other and thus capable of being performed concurrently. The rise of modern Graphics Processing Units (GPUs) and programming languages designed to leverage the inherent parallel nature of these processors have allowed researchers to dramatically speed up many programs that have such high arithmetic intensity and intrinsic concurrency. The presented GPU Optimized Wright–Fisher simulation, or “GO Fish” for short, can be used to simulate arbitrary selection and demographic scenarios while running over 250-fold faster than its serial counterpart on the CPU. Even modest GPU hardware can achieve an impressive speedup of over two orders of magnitude. With simulations so accelerated, one can not only do quick parametric bootstrapping of previously estimated parameters, but also use simulated results to calculate the likelihoods and summary statistics of demographic and selection models against real polymorphism data, all without restricting the demographic and selection scenarios that can be modeled or requiring approximations to the single-locus forward algorithm for efficiency. Further, as many of the parallel programming techniques used in this simulation can be applied to other computationally intensive algorithms important in population genetics, GO Fish serves as an exciting template for future research into accelerating computation in evolution. GO Fish is part of the Parallel PopGen Package available at: http://dl42.github.io/ParallelPopGen/. PMID:28768689
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Demonstrating Forces between Parallel Wires.
ERIC Educational Resources Information Center
Baker, Blane
2000-01-01
Describes a physics demonstration that dramatically illustrates the mutual repulsion (attraction) between parallel conductors using insulated copper wire, wooden dowels, a high direct current power supply, electrical tape, and an overhead projector. (WRM)
Chang, Kuo-Tsai
2007-01-01
This paper investigates electrical transient characteristics of a Rosen-type piezoelectric transformer (PT), including maximum voltages, time constants, energy losses and average powers, and their improvements immediately after turning OFF. A parallel resistor connected to both input terminals of the PT is needed to improve the transient characteristics. An equivalent circuit for the PT is first given. Then, an open-circuit voltage, involving a direct current (DC) component and an alternating current (AC) component, and its related energy losses are derived from the equivalent circuit with initial conditions. Moreover, an AC power control system, including a DC-to-AC resonant inverter, a control switch and electronic instruments, is constructed to determine the electrical characteristics of the OFF transient state. Furthermore, the effects of the parallel resistor on the transient characteristics at different parallel resistances are measured. The advantages of adding the parallel resistor also are discussed. From the measured results, the DC time constant is greatly decreased from 9 to 0.04 ms by a 10 k(omega) parallel resistance under open output.
Advanced fabrication of Si nanowire FET structures by means of a parallel approach.
Li, J; Pud, S; Mayer, D; Vitusevich, S
2014-07-11
In this paper we present fabricated Si nanowires (NWs) of different dimensions with enhanced electrical characteristics. The parallel fabrication process is based on nanoimprint lithography using high-quality molds, which facilitates the realization of 50 nm-wide NW field-effect transistors (FETs). The imprint molds were fabricated by using a wet chemical anisotropic etching process. The wet chemical etch results in well-defined vertical sidewalls with edge roughness (3σ) as small as 2 nm, which is about four times better compared with the roughness usually obtained for reactive-ion etching molds. The quality of the mold was studied using atomic force microscopy and scanning electron microscopy image data. The use of the high-quality mold leads to almost 100% yield during fabrication of Si NW FETs as well as to an exceptional quality of the surfaces of the devices produced. To characterize the Si NW FETs, we used noise spectroscopy as a powerful method for evaluating device performance and the reliability of structures with nanoscale dimensions. The Hooge parameter of fabricated FET structures exhibits an average value of 1.6 × 10(-3). This value reflects the high quality of Si NW FETs fabricated by means of a parallel approach that uses a nanoimprint mold and cost-efficient technology.
NASA Astrophysics Data System (ADS)
Sun, Dongye; Lin, Xinyou; Qin, Datong; Deng, Tao
2012-11-01
Energy management(EM) is a core technique of hybrid electric bus(HEB) in order to advance fuel economy performance optimization and is unique for the corresponding configuration. There are existing algorithms of control strategy seldom take battery power management into account with international combustion engine power management. In this paper, a type of power-balancing instantaneous optimization(PBIO) energy management control strategy is proposed for a novel series-parallel hybrid electric bus. According to the characteristic of the novel series-parallel architecture, the switching boundary condition between series and parallel mode as well as the control rules of the power-balancing strategy are developed. The equivalent fuel model of battery is implemented and combined with the fuel of engine to constitute the objective function which is to minimize the fuel consumption at each sampled time and to coordinate the power distribution in real-time between the engine and battery. To validate the proposed strategy effective and reasonable, a forward model is built based on Matlab/Simulink for the simulation and the dSPACE autobox is applied to act as a controller for hardware in-the-loop integrated with bench test. Both the results of simulation and hardware-in-the-loop demonstrate that the proposed strategy not only enable to sustain the battery SOC within its operational range and keep the engine operation point locating the peak efficiency region, but also the fuel economy of series-parallel hybrid electric bus(SPHEB) dramatically advanced up to 30.73% via comparing with the prototype bus and a similar improvement for PBIO strategy relative to rule-based strategy, the reduction of fuel consumption is up to 12.38%. The proposed research ensures the algorithm of PBIO is real-time applicability, improves the efficiency of SPHEB system, as well as suite to complicated configuration perfectly.
NASA Astrophysics Data System (ADS)
Han, Qiguo; Zhu, Kai; Shi, Wenming; Wu, Kuayu; Chen, Kai
2018-02-01
In order to solve the problem of low voltage ride through(LVRT) of the major auxiliary equipment’s variable-frequency drive (VFD) in thermal power plant, the scheme of supercapacitor paralleled in the DC link of VFD is put forward, furthermore, two solutions of direct parallel support and voltage boost parallel support of supercapacitor are proposed. The capacitor values for the relevant motor loads are calculated according to the law of energy conservation, and they are verified by Matlab simulation. At last, a set of test prototype is set up, and the test results prove the feasibility of the proposed schemes.
Parallel Harmony Search Based Distributed Energy Resource Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceylan, Oguzhan; Liu, Guodong; Tomsovic, Kevin
2015-01-01
This paper presents a harmony search based parallel optimization algorithm to minimize voltage deviations in three phase unbalanced electrical distribution systems and to maximize active power outputs of distributed energy resources (DR). The main contribution is to reduce the adverse impacts on voltage profile during a day as photovoltaics (PVs) output or electrical vehicles (EVs) charging changes throughout a day. The IEEE 123- bus distribution test system is modified by adding DRs and EVs under different load profiles. The simulation results show that by using parallel computing techniques, heuristic methods may be used as an alternative optimization tool in electricalmore » power distribution systems operation.« less
Performance of OVERFLOW-D Applications based on Hybrid and MPI Paradigms on IBM Power4 System
NASA Technical Reports Server (NTRS)
Djomehri, M. Jahed; Biegel, Bryan (Technical Monitor)
2002-01-01
This report briefly discusses our preliminary performance experiments with parallel versions of OVERFLOW-D applications. These applications are based on MPI and hybrid paradigms on the IBM Power4 system here at the NAS Division. This work is part of an effort to determine the suitability of the system and its parallel libraries (MPI/OpenMP) for specific scientific computing objectives.
Base drive for paralleled inverter systems
NASA Technical Reports Server (NTRS)
Nagano, S. (Inventor)
1980-01-01
In a paralleled inverter system, a positive feedback current derived from the total current from all of the modules of the inverter system is applied to the base drive of each of the power transistors of all modules, thereby to provide all modules protection against open or short circuit faults occurring in any of the modules, and force equal current sharing among the modules during turn on of the power transistors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-10-20
Look-ahead dynamic simulation software system incorporates the high performance parallel computing technologies, significantly reduces the solution time for each transient simulation case, and brings the dynamic simulation analysis into on-line applications to enable more transparency for better reliability and asset utilization. It takes the snapshot of the current power grid status, functions in parallel computing the system dynamic simulation, and outputs the transient response of the power system in real time.
A Multi-Level Parallelization Concept for High-Fidelity Multi-Block Solvers
NASA Technical Reports Server (NTRS)
Hatay, Ferhat F.; Jespersen, Dennis C.; Guruswamy, Guru P.; Rizk, Yehia M.; Byun, Chansup; Gee, Ken; VanDalsem, William R. (Technical Monitor)
1997-01-01
The integration of high-fidelity Computational Fluid Dynamics (CFD) analysis tools with the industrial design process benefits greatly from the robust implementations that are transportable across a wide range of computer architectures. In the present work, a hybrid domain-decomposition and parallelization concept was developed and implemented into the widely-used NASA multi-block Computational Fluid Dynamics (CFD) packages implemented in ENSAERO and OVERFLOW. The new parallel solver concept, PENS (Parallel Euler Navier-Stokes Solver), employs both fine and coarse granularity in data partitioning as well as data coalescing to obtain the desired load-balance characteristics on the available computer platforms. This multi-level parallelism implementation itself introduces no changes to the numerical results, hence the original fidelity of the packages are identically preserved. The present implementation uses the Message Passing Interface (MPI) library for interprocessor message passing and memory accessing. By choosing an appropriate combination of the available partitioning and coalescing capabilities only during the execution stage, the PENS solver becomes adaptable to different computer architectures from shared-memory to distributed-memory platforms with varying degrees of parallelism. The PENS implementation on the IBM SP2 distributed memory environment at the NASA Ames Research Center obtains 85 percent scalable parallel performance using fine-grain partitioning of single-block CFD domains using up to 128 wide computational nodes. Multi-block CFD simulations of complete aircraft simulations achieve 75 percent perfect load-balanced executions using data coalescing and the two levels of parallelism. SGI PowerChallenge, SGI Origin 2000, and a cluster of workstations are the other platforms where the robustness of the implementation is tested. The performance behavior on the other computer platforms with a variety of realistic problems will be included as this on-going study progresses.
Electromagnetic Design of a Magnetically-Coupled Spatial Power Combiner
NASA Technical Reports Server (NTRS)
Bulcha, B.; Cataldo, G.; Stevenson, T. R.; U-Yen, K.; Moseley, S. H.; Wollack, E. J.
2017-01-01
The design of a two-dimensional beam-combining network employing a parallel-plate superconducting waveguide with a mono-crystalline silicon dielectric is presented. This novel beam-combining network structure employs an array of magnetically coupled antenna elements to achieve high coupling efficiency and full sampling of the intensity distribution while avoiding diffractive losses in the multi-mode region defined by the parallel-plate waveguide. These attributes enable the structures use in realizing compact far-infrared spectrometers for astrophysical and instrumentation applications. When configured with a suitable corporate-feed power-combiner, this fully sampled array can be used to realize a low-sidelobe apodized response without incurring a reduction in coupling efficiency. To control undesired reflections over a wide range of angles in the finite-sized parallel-plate waveguide region, a wideband meta-material electromagnetic absorber structure is implemented. This adiabatic structure absorbs greater than 99 of the power over the 1.7:1 operational band at angles ranging from normal (0 degree) to near parallel (180 degree) incidence. Design, simulations, and application of the device will be presented.
Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo
2014-01-01
Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957
Parallel processing for scientific computations
NASA Technical Reports Server (NTRS)
Alkhatib, Hasan S.
1995-01-01
The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jacquelin, Mathias; De Jong, Wibe A.; Bylaska, Eric J.
2017-07-03
The Ab Initio Molecular Dynamics (AIMD) method allows scientists to treat the dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. This extremely important method has tremendous computational requirements, because the electronic Schr¨odinger equation, approximated using Kohn-Sham Density Functional Theory (DFT), is solved at every time step. With the advent of manycore architectures, application developers have a significant amount of processing power within each compute node that can only be exploited through massive parallelism. A compute intensive application such as AIMD forms a good candidate to leverage this processing power. In this paper, wemore » focus on adding thread level parallelism to the plane wave DFT methodology implemented in NWChem. Through a careful optimization of tall-skinny matrix products, which are at the heart of the Lagrange multiplier and nonlocal pseudopotential kernels, as well as 3D FFTs, our OpenMP implementation delivers excellent strong scaling on the latest Intel Knights Landing (KNL) processor. We assess the efficiency of our Lagrange multiplier kernels by building a Roofline model of the platform, and verify that our implementation is close to the roofline for various problem sizes. Finally, we present strong scaling results on the complete AIMD simulation for a 64 water molecules test case, that scales up to all 68 cores of the Knights Landing processor.« less
The source of dual-task limitations: Serial or parallel processing of multiple response selections?
Marois, René
2014-01-01
Although it is generally recognized that the concurrent performance of two tasks incurs costs, the sources of these dual-task costs remain controversial. The serial bottleneck model suggests that serial postponement of task performance in dual-task conditions results from a central stage of response selection that can only process one task at a time. Cognitive-control models, by contrast, propose that multiple response selections can proceed in parallel, but that serial processing of task performance is predominantly adopted because its processing efficiency is higher than that of parallel processing. In the present study, we empirically tested this proposition by examining whether parallel processing would occur when it was more efficient and financially rewarded. The results indicated that even when parallel processing was more efficient and was incentivized by financial reward, participants still failed to process tasks in parallel. We conclude that central information processing is limited by a serial bottleneck. PMID:23864266
X-Band, 17-Watt Solid-State Power Amplifier
NASA Technical Reports Server (NTRS)
Mittskus, Anthony; Stone, Ernest; Boger, William; Burgess, David; Honda, Richard; Nuckolls, Carl
2005-01-01
An advanced solid-state power amplifier that can generate an output power of as much as 17 W at a design operating frequency of 8.4 GHz has been designed and constructed as a smaller, lighter, less expensive alternative to traveling-wave-tube X-band amplifiers and to prior solid-state X-band power amplifiers of equivalent output power. This amplifier comprises a monolithic microwave integrated circuit (MMIC) amplifier module and a power-converter module integrated into a compact package (see Figure 1). The amplifier module contains an input variable-gain amplifier (VGA), an intermediate driver stage, a final power stage, and input and output power monitors (see Figure 2). The VGA and the driver amplifier are 0.5-m GaAs-based metal semiconductor field-effect transistors (MESFETs). The final power stage contains four parallel high-efficiency, GaAs-based pseudomorphic high-electron-mobility transistors (PHEMTs). The gain of the VGA is voltage-variable over a range of 10 to 24 dB. To provide for temperature compensation of the overall amplifier gain, the gain-control voltage is generated by an operational-amplifier circuit that includes a resistor/thermistor temperature-sensing network. The driver amplifier provides a gain of 14 dB to an output power of 27 dBm to drive the four parallel output PHEMTs, each of which is nominally capable of putting out as much as 5 W. The driver output is sent to the input terminals of the four parallel PHEMTs through microstrip power dividers; the outputs of these PHEMTs are combined by microstrip power combiners (which are similar to the microstrip power dividers) to obtain the final output power of 17 W.
Parallel Activation in Bilingual Phonological Processing
ERIC Educational Resources Information Center
Lee, Su-Yeon
2011-01-01
In bilingual language processing, the parallel activation hypothesis suggests that bilinguals activate their two languages simultaneously during language processing. Support for the parallel activation mainly comes from studies of lexical (word-form) processing, with relatively less attention to phonological (sound) processing. According to…
Staircase Quantum Dots Configuration in Nanowires for Optimized Thermoelectric Power
Li, Lijie; Jiang, Jian-Hua
2016-01-01
The performance of thermoelectric energy harvesters can be improved by nanostructures that exploit inelastic transport processes. One prototype is the three-terminal hopping thermoelectric device where electron hopping between quantum-dots are driven by hot phonons. Such three-terminal hopping thermoelectric devices have potential in achieving high efficiency or power via inelastic transport and without relying on heavy-elements or toxic compounds. We show in this work how output power of the device can be optimized via tuning the number and energy configuration of the quantum-dots embedded in parallel nanowires. We find that the staircase energy configuration with constant energy-step can improve the power factor over a serial connection of a single pair of quantum-dots. Moreover, for a fixed energy-step, there is an optimal length for the nanowire. Similarly for a fixed number of quantum-dots there is an optimal energy-step for the output power. Our results are important for future developments of high-performance nanostructured thermoelectric devices. PMID:27550093
Modular 5-kW Power-Processing Unit Being Developed for the Next-Generation Ion Engine
NASA Technical Reports Server (NTRS)
Pinero, Luis R.; Bond, Thomas H.; Okada, Don; Phelps, Keith; Pyter, Janusz; Wiseman, Steve
2001-01-01
The NASA Glenn Research Center is developing a 5- to 10-kW ion engine for a broad range of mission applications. Simultaneously, a 5-kW breadboard power-processing unit (PPU) is being designed and fabricated by Boeing Electron Dynamic Devices, Torrance, California, under contract with Glenn. The beam supply, which processes up to 90 percent of the power into this unit, consists of four 1.1-kW power modules connected in parallel, equally sharing the output current. The modular design allows scalability to higher powers as well as the possibility of implementing an N + 1 redundant beam supply. A novel phaseshifted/pulse-width-modulated, dual full-bridge topology was chosen for this module design for its efficient switching characteristics. A breadboard version of the beam power supply module was assembled. Efficiencies ranging between 91.6 and 96.9 percent were measured for an input voltage range of 80 to 160 V, an output voltage range of 800 to 1500 V, and output powers from 0.3 to 1.0 kW. This beam supply could result in a PPU with a total efficiency between 93 and 95 percent at a nominal input voltage of 100 V. This is up to a 4-percent improvement over the state-of-the-art PPU used for the Deep Space 1 mission. A flight-packaged PPU is expected to weigh no more than 15 kg, which represents a 50-percent reduction in specific mass from the Deep Space 1 design. This will make 5-kW ion propulsion very attractive for many planetary missions.
Trajectory Optimization for Missions to Small Bodies with a Focus on Scientific Merit.
Englander, Jacob A; Vavrina, Matthew A; Lim, Lucy F; McFadden, Lucy A; Rhoden, Alyssa R; Noll, Keith S
2017-01-01
Trajectory design for missions to small bodies is tightly coupled both with the selection of targets for a mission and with the choice of spacecraft power, propulsion, and other hardware. Traditional methods of trajectory optimization have focused on finding the optimal trajectory for an a priori selection of destinations and spacecraft parameters. Recent research has expanded the field of trajectory optimization to multidisciplinary systems optimization that includes spacecraft parameters. The logical next step is to extend the optimization process to include target selection based not only on engineering figures of merit but also scientific value. This paper presents a new technique to solve the multidisciplinary mission optimization problem for small-bodies missions, including classical trajectory design, the choice of spacecraft power and propulsion systems, and also the scientific value of the targets. This technique, when combined with modern parallel computers, enables a holistic view of the small body mission design process that previously required iteration among several different design processes.
NASA Astrophysics Data System (ADS)
Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.
2011-12-01
With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
NASA Astrophysics Data System (ADS)
Shi, X.
2015-12-01
As NSF indicated - "Theory and experimentation have for centuries been regarded as two fundamental pillars of science. It is now widely recognized that computational and data-enabled science forms a critical third pillar." Geocomputation is the third pillar of GIScience and geosciences. With the exponential growth of geodata, the challenge of scalable and high performance computing for big data analytics become urgent because many research activities are constrained by the inability of software or tool that even could not complete the computation process. Heterogeneous geodata integration and analytics obviously magnify the complexity and operational time frame. Many large-scale geospatial problems may be not processable at all if the computer system does not have sufficient memory or computational power. Emerging computer architectures, such as Intel's Many Integrated Core (MIC) Architecture and Graphics Processing Unit (GPU), and advanced computing technologies provide promising solutions to employ massive parallelism and hardware resources to achieve scalability and high performance for data intensive computing over large spatiotemporal and social media data. Exploring novel algorithms and deploying the solutions in massively parallel computing environment to achieve the capability for scalable data processing and analytics over large-scale, complex, and heterogeneous geodata with consistent quality and high-performance has been the central theme of our research team in the Department of Geosciences at the University of Arkansas (UARK). New multi-core architectures combined with application accelerators hold the promise to achieve scalability and high performance by exploiting task and data levels of parallelism that are not supported by the conventional computing systems. Such a parallel or distributed computing environment is particularly suitable for large-scale geocomputation over big data as proved by our prior works, while the potential of such advanced infrastructure remains unexplored in this domain. Within this presentation, our prior and on-going initiatives will be summarized to exemplify how we exploit multicore CPUs, GPUs, and MICs, and clusters of CPUs, GPUs and MICs, to accelerate geocomputation in different applications.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Trace: a high-throughput tomographic reconstruction engine for large-scale datasets.
Bicer, Tekin; Gürsoy, Doğa; Andrade, Vincent De; Kettimuthu, Rajkumar; Scullin, William; Carlo, Francesco De; Foster, Ian T
2017-01-01
Modern synchrotron light sources and detectors produce data at such scale and complexity that large-scale computation is required to unleash their full power. One of the widely used imaging techniques that generates data at tens of gigabytes per second is computed tomography (CT). Although CT experiments result in rapid data generation, the analysis and reconstruction of the collected data may require hours or even days of computation time with a medium-sized workstation, which hinders the scientific progress that relies on the results of analysis. We present Trace, a data-intensive computing engine that we have developed to enable high-performance implementation of iterative tomographic reconstruction algorithms for parallel computers. Trace provides fine-grained reconstruction of tomography datasets using both (thread-level) shared memory and (process-level) distributed memory parallelization. Trace utilizes a special data structure called replicated reconstruction object to maximize application performance. We also present the optimizations that we apply to the replicated reconstruction objects and evaluate them using tomography datasets collected at the Advanced Photon Source. Our experimental evaluations show that our optimizations and parallelization techniques can provide 158× speedup using 32 compute nodes (384 cores) over a single-core configuration and decrease the end-to-end processing time of a large sinogram (with 4501 × 1 × 22,400 dimensions) from 12.5 h to <5 min per iteration. The proposed tomographic reconstruction engine can efficiently process large-scale tomographic data using many compute nodes and minimize reconstruction times.
A numerical investigation of a thermodielectric power generation system
NASA Astrophysics Data System (ADS)
Sklar, Akiva A.
The performance of a novel micro-thermodielectric power generation system was investigated in order to determine if thermodielectric power generation can be practically employed and if its performance can compete with current portable power generation technologies. Thermodielectric power generation is a direct energy conversion technology that converts heat directly into high voltage direct current. It requires dielectric (i.e., capacitive) materials whose charge storing capabilities are a function of temperature. This property can be exploited by heating these materials after they are charged; as their temperature increases, their charge storage capability decreases, forcing them to eject a portion of their surface charge. This ejected charge can then be supplied to an appropriate electronic storage device. There are several advantages associated with thermodielectric energy conversion; first, it requires heat addition at relatively low conventional power generation temperatures, i.e., less than 600 °K, and second, devices that utilize it have the potential for excellent power density and device reliability. The predominant disadvantage of using this power generation technique is that the device must operate in an unsteady manner; this can lead to substantial heat transfer losses that limit the device's thermal efficiency. The studied power generation system was designed so that the power generating components of the system (i.e., the thermodielectric materials) are integrated within a micro-scale heat exchange apparatus designed specifically to provide the thermodielectric materials with the unsteady heating and cooling necessary for efficient power generation. This apparatus is designed to utilize a liquid as a working fluid in order to maximize its heat transfer capabilities, minimize the size of the heat exchanger, and maximize the power density of the power generation system. The thermodielectric materials are operated through a power generation cycle that consists of four processes; the first process is a charging process, during which an electric field is applied to a thermodielectric material, causing it to acquire electrical charge on its surface (this process is analogous to the isentropic compression process of a Brayton cycle). The second process is a heating process in which the temperature of the dielectric material is increased via heat transfer from an external source. During this process, the thermodielectric material is forced to eject a portion of its surface charge because its charge storing capability decreases as the temperature increases; the ejected charge is intended for capture by external circuitry connected to the thermodielectric material, where it can be routed to an electrochemical storage device or an electromechanical device requiring high voltage direct current. The third process is a discharging process, during which the applied electric field is reduced to its initial strength (analogous to the isentropic expansion process of a Brayton cycle). The final process is a cooling process in which the temperature of the dielectric material is decreased via heat transfer from an external source, returning it to its initial temperature. Previously, predicting the performance of a thermodielectric power generator was hindered by a poor understanding of the material's thermodynamic properties and the effect unsteady heat transfer losses have on system performance. In order to improve predictive capabilities in this study, a thermodielectric equation of state was developed that relates the strength of the applied electric field, the amount of surface charge stored by the thermodielectric material, and its temperature. This state equation was then used to derive expressions for the material's thermodynamic states (internal energy, entropy), which were subsequently used to determine the optimum material properties for power generation. Next, a numerical simulation code was developed to determine the heat transfer capabilities of a micro-scale parallel plate heat recuperator (MPPHR), a device designed specifically to (a) provide the unsteady heating and cooling necessary for thermodielectric power generation and (b) minimize the unsteady heat transfer losses of the system. The simulation code was used to find the optimum heat transfer and heat recuperation regimes of the MPPHR. The previously derived thermodynamic equations that describe the behavior of the thermodielectric materials were then incorporated into the model for the walls of the parallel plate channel in the numerical simulation code, creating a tool capable of determining the thermodynamic performance of an MTDPG, in terms of the thermal efficiency, percent Carnot efficiency, and energy/power density. A detailed parameterization of the MTDPG with the simulation code yielded the critical non-dimensional numbers that determine the relationship between the heat exchange/recuperation abilities of the flow and the power generation capabilities of the thermodielectric materials. These relationships were subsequently used to optimize the performance of an MTDPG with an operating temperature range of 300--500 °K. The optimization predicted that the MTDPG could provide a thermal efficiency of 29.7 percent with the potential to reach 34 percent. These thermal efficiencies correspond to 74.2 and 85 percent of the Carnot efficiency, respectively. The power density of this MTDPG depends on the operating frequency and can exceed 1,000,000 W/m3.
Analysis of turbine-grid interaction of grid-connected wind turbine using HHT
NASA Astrophysics Data System (ADS)
Chen, A.; Wu, W.; Miao, J.; Xie, D.
2018-05-01
This paper processes the output power of the grid-connected wind turbine with the denoising and extracting method based on Hilbert Huang transform (HHT) to discuss the turbine-grid interaction. At first, the detailed Empirical Mode Decomposition (EMD) and the Hilbert Transform (HT) are introduced. Then, on the premise of decomposing the output power of the grid-connected wind turbine into a series of Intrinsic Mode Functions (IMFs), energy ratio and power volatility are calculated to detect the unessential components. Meanwhile, combined with vibration function of turbine-grid interaction, data fitting of instantaneous amplitude and phase of each IMF is implemented to extract characteristic parameters of different interactions. Finally, utilizing measured data of actual parallel-operated wind turbines in China, this work accurately obtains the characteristic parameters of turbine-grid interaction of grid-connected wind turbine.
Mahmood, Zohaib; McDaniel, Patrick; Guérin, Bastien; Keil, Boris; Vester, Markus; Adalsteinsson, Elfar; Wald, Lawrence L; Daniel, Luca
2016-07-01
In a coupled parallel transmit (pTx) array, the power delivered to a channel is partially distributed to other channels because of coupling. This power is dissipated in circulators resulting in a significant reduction in power efficiency. In this study, a technique for designing robust decoupling matrices interfaced between the RF amplifiers and the coils is proposed. The decoupling matrices ensure that most forward power is delivered to the load without loss of encoding capabilities of the pTx array. The decoupling condition requires that the impedance matrix seen by the power amplifiers is a diagonal matrix whose entries match the characteristic impedance of the power amplifiers. In this work, the impedance matrix of the coupled coils is diagonalized by a successive multiplication by its eigenvectors. A general design procedure and software are developed to generate automatically the hardware that implements diagonalization using passive components. The general design method is demonstrated by decoupling two example parallel transmit arrays. Our decoupling matrices achieve better than -20 db decoupling in both cases. A robust framework for designing decoupling matrices for pTx arrays is presented and validated. The proposed decoupling strategy theoretically scales to any arbitrary number of channels. Magn Reson Med 76:329-339, 2016. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Towards green high capacity optical networks
NASA Astrophysics Data System (ADS)
Glesk, I.; Mohd Warip, M. N.; Idris, S. K.; Osadola, T. B.; Andonovic, I.
2011-09-01
The demand for fast, secure, energy efficient high capacity networks is growing. It is fuelled by transmission bandwidth needs which will support among other things the rapid penetration of multimedia applications empowering smart consumer electronics and E-businesses. All the above trigger unparallel needs for networking solutions which must offer not only high-speed low-cost "on demand" mobile connectivity but should be ecologically friendly and have low carbon footprint. The first answer to address the bandwidth needs was deployment of fibre optic technologies into transport networks. After this it became quickly obvious that the inferior electronic bandwidth (if compared to optical fiber) will further keep its upper hand on maximum implementable serial data rates. A new solution was found by introducing parallelism into data transport in the form of Wavelength Division Multiplexing (WDM) which has helped dramatically to improve aggregate throughput of optical networks. However with these advancements a new bottleneck has emerged at fibre endpoints where data routers must process the incoming and outgoing traffic. Here, even with the massive and power hungry electronic parallelism routers today (still relying upon bandwidth limiting electronics) do not offer needed processing speeds networks demands. In this paper we will discuss some novel unconventional approaches to address network scalability leading to energy savings via advance optical signal processing. We will also investigate energy savings based on advanced network management through nodes hibernation proposed for Optical IP networks. The hibernation reduces the network overall power consumption by forming virtual network reconfigurations through selective nodes groupings and by links segmentations and partitionings.
Development of parallel algorithms for electrical power management in space applications
NASA Technical Reports Server (NTRS)
Berry, Frederick C.
1989-01-01
The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
Parallel computing on Unix workstation arrays
NASA Astrophysics Data System (ADS)
Reale, F.; Bocchino, F.; Sciortino, S.
1994-12-01
We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing
Karimi, Ramin; Hajdu, Andras
2016-01-01
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. PMID:26884678
HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.
Karimi, Ramin; Hajdu, Andras
2016-01-01
Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.
García-Calvo, Raúl; Guisado, JL; Diaz-del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco
2018-01-01
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes—master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)—is carried out for this problem. Several procedures that optimize the use of the GPU’s resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs). PMID:29662297
García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco
2018-01-01
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs).
Thread concept for automatic task parallelization in image analysis
NASA Astrophysics Data System (ADS)
Lueckenhaus, Maximilian; Eckstein, Wolfgang
1998-09-01
Parallel processing of image analysis tasks is an essential method to speed up image processing and helps to exploit the full capacity of distributed systems. However, writing parallel code is a difficult and time-consuming process and often leads to an architecture-dependent program that has to be re-implemented when changing the hardware. Therefore it is highly desirable to do the parallelization automatically. For this we have developed a special kind of thread concept for image analysis tasks. Threads derivated from one subtask may share objects and run in the same context but may process different threads of execution and work on different data in parallel. In this paper we describe the basics of our thread concept and show how it can be used as basis of an automatic task parallelization to speed up image processing. We further illustrate the design and implementation of an agent-based system that uses image analysis threads for generating and processing parallel programs by taking into account the available hardware. The tests made with our system prototype show that the thread concept combined with the agent paradigm is suitable to speed up image processing by an automatic parallelization of image analysis tasks.
Accelerating image recognition on mobile devices using GPGPU
NASA Astrophysics Data System (ADS)
Bordallo López, Miguel; Nykänen, Henri; Hannuksela, Jari; Silvén, Olli; Vehviläinen, Markku
2011-01-01
The future multi-modal user interfaces of battery-powered mobile devices are expected to require computationally costly image analysis techniques. The use of Graphic Processing Units for computing is very well suited for parallel processing and the addition of programmable stages and high precision arithmetic provide for opportunities to implement energy-efficient complete algorithms. At the moment the first mobile graphics accelerators with programmable pipelines are available, enabling the GPGPU implementation of several image processing algorithms. In this context, we consider a face tracking approach that uses efficient gray-scale invariant texture features and boosting. The solution is based on the Local Binary Pattern (LBP) features and makes use of the GPU on the pre-processing and feature extraction phase. We have implemented a series of image processing techniques in the shader language of OpenGL ES 2.0, compiled them for a mobile graphics processing unit and performed tests on a mobile application processor platform (OMAP3530). In our contribution, we describe the challenges of designing on a mobile platform, present the performance achieved and provide measurement results for the actual power consumption in comparison to using the CPU (ARM) on the same platform.
Two-stage Electron Acceleration by 3D Collisionless Guide-field Magnetic Reconnection
NASA Astrophysics Data System (ADS)
Buechner, J.; Munoz, P.
2017-12-01
We discuss a two-stage process of electron acceleration near X-lines of 3D collisionless guide-field magnetic reconnection. Non-relativistic electrons are first pre-accelerated by magnetic-field-aligned (parallel) electric fields. At the nonlinear stage of 3D guide-field magnetic reconnection electric and magnetic fields become filamentary structured due to streaming instabilities. This causes an additional curvature-driven electron acceleration in the guide-field direction. The resulting spectrum of the accelerated electrons follows a power law.
Plasma-enhanced chemical vapor deposition of multiwalled carbon nanofibers.
Matthews, Kristopher; Cruden, Brett A; Chen, Bin; Meyyappan, M; Delzeit, Lance
2002-10-01
Plasma-enhanced chemical vapor deposition is used to grow vertically aligned multiwalled carbon nanofibers (MWNFs). The graphite basal planes in these nanofibers are not parallel as in nanotubes; instead they exhibit a small angle resembling a stacked cone arrangement. A parametric study with varying process parameters such as growth temperature, feedstock composition, and substrate power has been conducted, and these parameters are found to influence the growth rate, diameter, and morphology. The well-aligned MWNFs are suitable for fabricating electrode systems in sensor and device development.
Plasma-enhanced chemical vapor deposition of multiwalled carbon nanofibers
NASA Technical Reports Server (NTRS)
Matthews, Kristopher; Cruden, Brett A.; Chen, Bin; Meyyappan, M.; Delzeit, Lance
2002-01-01
Plasma-enhanced chemical vapor deposition is used to grow vertically aligned multiwalled carbon nanofibers (MWNFs). The graphite basal planes in these nanofibers are not parallel as in nanotubes; instead they exhibit a small angle resembling a stacked cone arrangement. A parametric study with varying process parameters such as growth temperature, feedstock composition, and substrate power has been conducted, and these parameters are found to influence the growth rate, diameter, and morphology. The well-aligned MWNFs are suitable for fabricating electrode systems in sensor and device development.
Studies in optical parallel processing. [All optical and electro-optic approaches
NASA Technical Reports Server (NTRS)
Lee, S. H.
1978-01-01
Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Smart photodetector arrays for error control in page-oriented optical memory
NASA Astrophysics Data System (ADS)
Schaffer, Maureen Elizabeth
1998-12-01
Page-oriented optical memories (POMs) have been proposed to meet high speed, high capacity storage requirements for input/output intensive computer applications. This technology offers the capability for storage and retrieval of optical data in two-dimensional pages resulting in high throughput data rates. Since currently measured raw bit error rates for these systems fall several orders of magnitude short of industry requirements for binary data storage, powerful error control codes must be adopted. These codes must be designed to take advantage of the two-dimensional memory output. In addition, POMs require an optoelectronic interface to transfer the optical data pages to one or more electronic host systems. Conventional charge coupled device (CCD) arrays can receive optical data in parallel, but the relatively slow serial electronic output of these devices creates a system bottleneck thereby eliminating the POM advantage of high transfer rates. Also, CCD arrays are "unintelligent" interfaces in that they offer little data processing capabilities. The optical data page can be received by two-dimensional arrays of "smart" photo-detector elements that replace conventional CCD arrays. These smart photodetector arrays (SPAs) can perform fast parallel data decoding and error control, thereby providing an efficient optoelectronic interface between the memory and the electronic computer. This approach optimizes the computer memory system by combining the massive parallelism and high speed of optics with the diverse functionality, low cost, and local interconnection efficiency of electronics. In this dissertation we examine the design of smart photodetector arrays for use as the optoelectronic interface for page-oriented optical memory. We review options and technologies for SPA fabrication, develop SPA requirements, and determine SPA scalability constraints with respect to pixel complexity, electrical power dissipation, and optical power limits. Next, we examine data modulation and error correction coding for the purpose of error control in the POM system. These techniques are adapted, where possible, for 2D data and evaluated as to their suitability for a SPA implementation in terms of BER, code rate, decoder time and pixel complexity. Our analysis shows that differential data modulation combined with relatively simple block codes known as array codes provide a powerful means to achieve the desired data transfer rates while reducing error rates to industry requirements. Finally, we demonstrate the first smart photodetector array designed to perform parallel error correction on an entire page of data and satisfy the sustained data rates of page-oriented optical memories. Our implementation integrates a monolithic PN photodiode array and differential input receiver for optoelectronic signal conversion with a cluster error correction code using 0.35-mum CMOS. This approach provides high sensitivity, low electrical power dissipation, and fast parallel correction of 2 x 2-bit cluster errors in an 8 x 8 bit code block to achieve corrected output data rates scalable to 102 Gbps in the current technology increasing to 1.88 Tbps in 0.1-mum CMOS.
Reduced model simulations of the scrape-off-layer heat-flux width and comparison with experiment
Myra, J. R.; Russell, D. A.; D’Ippolito, D. A.; ...
2011-01-01
Reduced model simulations of turbulence in the edge and scrape-off-layer (SOL) region of a spherical torus or tokamak plasma are employed to address the physics of the scrape-off-layer heat flux width. The simulation model is an electrostatic two-dimensional fluid turbulence model, applied in the plane perpendicular to the magnetic field at the outboard midplane of the torus. The model contains curvature-driven-interchange modes, sheath losses, and both perpendicular turbulent diffusive and convective (blob) transport. These transport processes compete with classical parallel transport to set the SOL width. Midplane SOL profiles of density, temperature and parallel heat flux are obtained from themore » simulation and compared with experimental results from the National Spherical Torus Experiment (NSTX) to study the scaling of the heat flux width with power and plasma current. It is concluded that midplane turbulence is the main contributor to the SOL heat flux width for the low power H-mode discharges studied, while additional physics is required to explain the plasma current scaling of the SOL heat flux width observed experimentally in higher power discharges. Intermittent separatrix spanning convective cells are found to be the main mechanism that sets the near-SOL width in the simulations. The roles of sheared flows and blob trapping vs. emission are discussed.« less
Parallel workflow tools to facilitate human brain MRI post-processing
Cui, Zaixu; Zhao, Chenxi; Gong, Gaolang
2015-01-01
Multi-modal magnetic resonance imaging (MRI) techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues. PMID:26029043
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX
Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.; ...
2018-03-07
In this paper, the radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C +6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Finally, indications are that the radialmore » electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.« less
Radial electric field and ion parallel flow in the quasi-symmetric and Mirror configurations of HSX
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, S. T. A.; Dobbins, T. J.; Talmadge, J. N.
In this paper, the radial electric field and the ion mean parallel flow are obtained in the helically symmetric experiment stellarator from toroidal flow measurements of C +6 ion at two locations on a flux surface, using the Pfirsch–Schlüter effect. Results from the standard quasi-helically symmetric magnetic configuration are compared with those from the Mirror configuration where the quasi-symmetry is deliberately degraded using auxiliary coils. For similar injected power, the quasi-symmetric configuration is observed to have significantly lower flows while the experimental observations from the Mirror geometry are in better agreement with neoclassical calculations. Finally, indications are that the radialmore » electric field near the core of the quasi-symmetric configuration may be governed by non-neoclassical processes.« less
Accelerating electron tomography reconstruction algorithm ICON with GPU.
Chen, Yu; Wang, Zihao; Zhang, Jingrong; Li, Lun; Wan, Xiaohua; Sun, Fei; Zhang, Fa
2017-01-01
Electron tomography (ET) plays an important role in studying in situ cell ultrastructure in three-dimensional space. Due to limited tilt angles, ET reconstruction always suffers from the "missing wedge" problem. With a validation procedure, iterative compressed-sensing optimized NUFFT reconstruction (ICON) demonstrates its power in the restoration of validated missing information for low SNR biological ET dataset. However, the huge computational demand has become a major problem for the application of ICON. In this work, we analyzed the framework of ICON and classified the operations of major steps of ICON reconstruction into three types. Accordingly, we designed parallel strategies and implemented them on graphics processing units (GPU) to generate a parallel program ICON-GPU. With high accuracy, ICON-GPU has a great acceleration compared to its CPU version, up to 83.7×, greatly relieving ICON's dependence on computing resource.
Effect of Cooling Units on the Performance of an Automotive Exhaust-Based Thermoelectric Generator
NASA Astrophysics Data System (ADS)
Su, C. Q.; Zhu, D. C.; Deng, Y. D.; Wang, Y. P.; Liu, X.
2017-05-01
Currently, automotive exhaust-based thermoelectric generators (AETEGs) are a hot topic in energy recovery. In order to investigate the influence of coolant flow rate, coolant flow direction and cooling unit arrangement in the AETEG, a thermoelectric generator (TEG) model and a related test bench are constructed. Water cooling is adopted in this study. Due to the non-uniformity of the surface temperature of the heat source, the coolant flow direction would affect the output performance of the TEG. Changing the volumetric flow rate of coolant can increase the output power of multi-modules connected in series or/and parallel as it can improve the temperature uniformity of the cooling unit. Since the temperature uniformity of the cooling unit has a strong influence on the output power, two cooling units are connected in series or parallel to research the effect of cooling unit arrangements on the maximum output power of the TEG. Experimental and theoretical analyses reveal that the net output power is generally higher with cooling units connected in parallel than cooling units connected in series in the cooling system with two cooling units.
Cooperative storage of shared files in a parallel computing system with dynamic block size
Bent, John M.; Faibish, Sorin; Grider, Gary
2015-11-10
Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).
2. LOOKING DOWN THE LINED POWER CANAL AS IT WINDS ...
2. LOOKING DOWN THE LINED POWER CANAL AS IT WINDS ITS WAY TOWARD THE CEMENT MILL Photographer: Walter J. Lubken, November 19, 1907 - Roosevelt Power Canal & Diversion Dam, Parallels Salt River, Roosevelt, Gila County, AZ
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Brian B; Purba, Victor; Jafarpour, Saber
Next-generation power networks will contain large numbers of grid-connected inverters satisfying a significant fraction of system load. Since each inverter model has a relatively large number of dynamic states, it is impractical to analyze complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loopmore » for grid synchronization. We outline a structure-preserving reduced-order inverter model with lumped parameters for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. We show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as any individual inverter in the system. Numerical simulations validate the reduced-order model.« less
Estrada-Arriaga, Edson Baltazar; Hernández-Romano, Jesús; García-Sánchez, Liliana; Guillén Garcés, Rosa Angélica; Bahena-Bahena, Erick Obed; Guadarrama-Pérez, Oscar; Moeller Chavez, Gabriela Eleonora
2018-05-15
In this study, a continuous flow stack consisting of 40 individual air-cathode MFC units was used to determine the performance of stacked MFC during domestic wastewater treatment operated with unconnected individual MFC and in series and parallel configuration. The voltages obtained from individual MFC units were of 0.08-1.1 V at open circuit voltage, while in series connection, the maximum power and current density were 2500 mW/m 2 and 500 mA/m 2 (4.9 V), respectively. In parallel connection, the maximum power and current density was 5.8 mW/m 2 and 24 mA/m 2 , respectively. When the cells were not connected to each other MFC unit, the main bacterial species found in the anode biofilms were Bacillus and Lysinibacillus. After switching from unconnected to series and parallel connections, the most abundant species in the stacked MFC were Pseudomonas aeruginosa, followed by different Bacilli classes. This study demonstrated that when the stacked MFC was switched from unconnected to series and parallel connections, the pollutants removal, performance electricity and microbial community changed significantly. Voltages drops were observed in the stacked MFC, which was mainly limited by the cathodes. These voltages loss indicated high resistances within the stacked MFC, generating a parasitic cross current. Copyright © 2018 Elsevier Ltd. All rights reserved.
Genten: Software for Generalized Tensor Decompositions v. 1.0.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Phipps, Eric T.; Kolda, Tamara G.; Dunlavy, Daniel
Tensors, or multidimensional arrays, are a powerful mathematical means of describing multiway data. This software provides computational means for decomposing or approximating a given tensor in terms of smaller tensors of lower dimension, focusing on decomposition of large, sparse tensors. These techniques have applications in many scientific areas, including signal processing, linear algebra, computer vision, numerical analysis, data mining, graph analysis, neuroscience and more. The software is designed to take advantage of parallelism present emerging computer architectures such has multi-core CPUs, many-core accelerators such as the Intel Xeon Phi, and computation-oriented GPUs to enable efficient processing of large tensors.
Analysis of DE-1 PWI electric field data
NASA Technical Reports Server (NTRS)
Weimer, Daniel
1994-01-01
The measurement of low frequency electric field oscillations may be accomplished with the Plasma Wave Instrument (PWI) on DE 1. Oscillations at a frequency around 1 Hz are below the range of the conventional plasma wave receivers, but they can be detected by using a special processing of the quasi-static electric field data. With this processing it is also possible to determine if the electric field oscillations are predominately parallel or perpendicular to the ambient magnetic field. The quasi-static electric field in the DE 1 spin/orbit plane is measured with a long-wire 'double probe'. This antenna is perpendicular to the satellite spin axis, which in turn is approximately perpendicular to the geomagnetic field in the polar magnetosphere. The electric field data are digitally sampled at a frequency of 16 Hz. The measured electric field signal, which has had phase reversals introduced by the rotating antenna, is multiplied by the sine of the rotation angle between the antenna and the magnetic field. This is called the 'perpendicular' signal. The measured time series is also multiplied with the cosine of the angle to produce a separate 'parallel' signal. These two separate time series are then processed to determine the frequency power spectrum.
Parallel Quantum Circuit in a Tunnel Junction
NASA Astrophysics Data System (ADS)
Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian
2016-07-01
Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N).
Parallel Quantum Circuit in a Tunnel Junction
Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian
2016-01-01
Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N2 power law is preserved for Ωab(N) and for Vab(N). PMID:27453262
Parallel Quantum Circuit in a Tunnel Junction.
Faizy Namarvar, Omid; Dridi, Ghassen; Joachim, Christian
2016-07-25
Spectral analysis of 1 and 2-states per line quantum bus are normally sufficient to determine the effective Vab(N) electronic coupling between the emitter and receiver states through the bus as a function of the number N of parallel lines. When Vab(N) is difficult to determine, an Heisenberg-Rabi time dependent quantum exchange process must be triggered through the bus to capture the secular oscillation frequency Ωab(N) between those states. Two different linear and regimes are demonstrated for Ωab(N) as a function of N. When the initial preparation is replaced by coupling of the quantum bus to semi-infinite electrodes, the resulting quantum transduction process is not faithfully following the Ωab(N) variations. Because of the electronic transparency normalisation to unity and of the low pass filter character of this transduction, large Ωab(N) cannot be captured by the tunnel junction. The broadly used concept of electrical contact between a metallic nanopad and a molecular device must be better described as a quantum transduction process. At small coupling and when N is small enough not to compensate for this small coupling, an N(2) power law is preserved for Ωab(N) and for Vab(N).
Accelerating sino-atrium computer simulations with graphic processing units.
Zhang, Hong; Xiao, Zheng; Lin, Shien-fong
2015-01-01
Sino-atrial node cells (SANCs) play a significant role in rhythmic firing. To investigate their role in arrhythmia and interactions with the atrium, computer simulations based on cellular dynamic mathematical models are generally used. However, the large-scale computation usually makes research difficult, given the limited computational power of Central Processing Units (CPUs). In this paper, an accelerating approach with Graphic Processing Units (GPUs) is proposed in a simulation consisting of the SAN tissue and the adjoining atrium. By using the operator splitting method, the computational task was made parallel. Three parallelization strategies were then put forward. The strategy with the shortest running time was further optimized by considering block size, data transfer and partition. The results showed that for a simulation with 500 SANCs and 30 atrial cells, the execution time taken by the non-optimized program decreased 62% with respect to a serial program running on CPU. The execution time decreased by 80% after the program was optimized. The larger the tissue was, the more significant the acceleration became. The results demonstrated the effectiveness of the proposed GPU-accelerating methods and their promising applications in more complicated biological simulations.
A low cost, disposable cable-shaped Al-air battery for portable biosensors
NASA Astrophysics Data System (ADS)
Fotouhi, Gareth; Ogier, Caleb; Kim, Jong-Hoon; Kim, Sooyeun; Cao, Guozhong; Shen, Amy Q.; Kramlich, John; Chung, Jae-Hyun
2016-05-01
A disposable cable-shaped flexible battery is presented using a simple, low cost manufacturing process. The working principle of an aluminum-air galvanic cell is used for the cable-shaped battery to power portable and point-of-care medical devices. The battery is catalyzed with a carbon nanotube (CNT)-paper matrix. A scalable manufacturing process using a lathe is developed to wrap a paper layer and a CNT-paper matrix on an aluminum wire. The matrix is then wrapped with a silver-plated copper wire to form the battery cell. The battery is activated through absorption of electrolytes including phosphate-buffered saline, NaOH, urine, saliva, and blood into the CNT-paper matrix. The maximum electric power using a 10 mm-long battery cell is over 1.5 mW. As a demonstration, an LED is powered using two groups of four batteries in parallel connected in series. Considering the material composition and the cable-shaped configuration, the battery is fully disposable, flexible, and potentially compatible with portable biosensors through activation by either reagents or biological fluids.
Development and Buildup of a Stirling Radioisotope Generator Electrical Simulator
NASA Technical Reports Server (NTRS)
Prokop, Norman F.; Krasowski, Michael J.; Greer, Lawrence C.; Flatico, Joseph M.; Spina, Dan C.
2008-01-01
This paper describes the development of a Stirling Radioisotope Generator (SRG) Simulator for use in a prototype lunar robotic rover. The SRG developed at NASA Glenn Research Center (GRC) is a promising power source for the robotic exploration of the sunless areas of the moon. The simulator designed provides a power output similar to the SRG output of 5.7 A at 28 Vdc, while using ac wall power as the input power source. The designed electrical simulator provides rover developers the physical and electrical constraints of the SRG supporting parallel development of the SRG and rover. Parallel development allows the rover design team to embrace the SRG s unique constraints while development of the SRG is continued to a flight qualified version.
Efficient multitasking: parallel versus serial processing of multiple tasks
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling. PMID:26441742
Efficient multitasking: parallel versus serial processing of multiple tasks.
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
Breaking Barriers to Low-Cost Modular Inverter Production & Use
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bogdan Borowy; Leo Casey; Jerry Foshage
2005-05-31
The goal of this cost share contract is to advance key technologies to reduce size, weight and cost while enhancing performance and reliability of Modular Inverter Product for Distributed Energy Resources (DER). Efforts address technology development to meet technical needs of DER market protection, isolation, reliability, and quality. Program activities build on SatCon Technology Corporation inverter experience (e.g., AIPM, Starsine, PowerGate) for Photovoltaic, Fuel Cell, Energy Storage applications. Efforts focused four technical areas, Capacitors, Cooling, Voltage Sensing and Control of Parallel Inverters. Capacitor efforts developed a hybrid capacitor approach for conditioning SatCon's AIPM unit supply voltages by incorporating several typesmore » and sizes to store energy and filter at high, medium and low frequencies while minimizing parasitics (ESR and ESL). Cooling efforts converted the liquid cooled AIPM module to an air-cooled unit using augmented fin, impingement flow cooling. Voltage sensing efforts successfully modified the existing AIPM sensor board to allow several, application dependent configurations and enabling voltage sensor galvanic isolation. Parallel inverter control efforts realized a reliable technique to control individual inverters, connected in a parallel configuration, without a communication link. Individual inverter currents, AC and DC, were balanced in the paralleled modules by introducing a delay to the individual PWM gate pulses. The load current sharing is robust and independent of load types (i.e., linear and nonlinear, resistive and/or inductive). It is a simple yet powerful method for paralleling both individual devices dramatically improves reliability and fault tolerance of parallel inverter power systems. A patent application has been made based on this control technology.« less
Active Flash: Performance-Energy Tradeoffs for Out-of-Core Processing on Non-Volatile Memory Devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boboila, Simona; Kim, Youngjae; Vazhkudai, Sudharshan S
2012-01-01
In this abstract, we study the performance and energy tradeoffs involved in migrating data analysis into the flash device, a process we refer to as Active Flash. The Active Flash paradigm is similar to 'active disks', which has received considerable attention. Active Flash allows us to move processing closer to data, thereby minimizing data movement costs and reducing power consumption. It enables true out-of-core computation. The conventional definition of out-of-core solvers refers to an approach to process data that is too large to fit in the main memory and, consequently, requires access to disk. However, in Active Flash, processing outsidemore » the host CPU literally frees the core and achieves real 'out-of-core' analysis. Moving analysis to data has long been desirable, not just at this level, but at all levels of the system hierarchy. However, this requires a detailed study on the tradeoffs involved in achieving analysis turnaround under an acceptable energy envelope. To this end, we first need to evaluate if there is enough computing power on the flash device to warrant such an exploration. Flash processors require decent computing power to run the internal logic pertaining to the Flash Translation Layer (FTL), which is responsible for operations such as address translation, garbage collection (GC) and wear-leveling. Modern SSDs are composed of multiple packages and several flash chips within a package. The packages are connected using multiple I/O channels to offer high I/O bandwidth. SSD computing power is also expected to be high enough to exploit such inherent internal parallelism within the drive to increase the bandwidth and to handle fast I/O requests. More recently, SSD devices are being equipped with powerful processing units and are even embedded with multicore CPUs (e.g. ARM Cortex-A9 embedded processor is advertised to reach 2GHz frequency and deliver 5000 DMIPS; OCZ RevoDrive X2 SSD has 4 SandForce controllers, each with 780MHz max frequency Tensilica core). Efforts that take advantage of the available computing cycles on the processors on SSDs to run auxiliary tasks other than actual I/O requests are beginning to emerge. Kim et al. investigate database scan operations in the context of processing on the SSDs, and propose dedicated hardware logic to speed up scans. Also, cluster architectures have been explored, which consist of low-power embedded CPUs coupled with small local flash to achieve fast, parallel access to data. Processor utilization on SSD is highly dependent on workloads and, therefore, they can be idle during periods with no I/O accesses. We propose to use the available processing capability on the SSD to run tasks that can be offloaded from the host. This paper makes the following contributions: (1) We have investigated Active Flash and its potential to optimize the total energy cost, including power consumption on the host and the flash device; (2) We have developed analytical models to analyze the performance-energy tradeoffs for Active Flash, by treating the SSD as a blackbox, this is particularly valuable due to the proprietary nature of the SSD internal hardware; and (3) We have enhanced a well-known SSD simulator (from MSR) to implement 'on-the-fly' data compression using Active Flash. Our results provide a window into striking a balance between energy consumption and application performance.« less
Wilson, J Adam; Williams, Justin C
2009-01-01
The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong
2017-07-01
Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.
NASA Technical Reports Server (NTRS)
Mclyman, W. T. (Inventor)
1981-01-01
In a push-pull converter, switching transistors are protected from peak power stresses by a separate snubber circuit in parallel with each comprising a capacitor and an inductor in series, and a diode in parallel with the inductor. The diode is connected to conduct current of the same polarity as the base-emitter juction of the transistor so that energy stored in the capacitor while the transistor is switched off, to protect it against peak power stress, discharges through the inductor when the transistor is turned on, and after the capacitor is discharges through the diode. To return this energy to the power supply, or to utilize this energy in some external circuit, the inductor may be replaced by a transformer having its secondary winding connected to the power supply or to the external circuit.
NASA Astrophysics Data System (ADS)
Ma, Yitao; Miura, Sadahiko; Honjo, Hiroaki; Ikeda, Shoji; Hanyu, Takahiro; Ohno, Hideo; Endoh, Tetsuo
2017-04-01
A high-density nonvolatile associative memory (NV-AM) based on spin transfer torque magnetoresistive random access memory (STT-MRAM), which achieves highly concurrent and ultralow-power nearest neighbor search with full adaptivity of the template data format, has been proposed and fabricated using the 90 nm CMOS/70 nm perpendicular-magnetic-tunnel-junction hybrid process. A truly compact current-mode circuitry is developed to realize flexibly controllable and high-parallel similarity evaluation, which makes the NV-AM adaptable to any dimensionality and component-bit of template data. A compact dual-stage time-domain minimum searching circuit is also developed, which can freely extend the system for more template data by connecting multiple NM-AM cores without additional circuits for integrated processing. Both the embedded STT-MRAM module and the computing circuit modules in this NV-AM chip are synchronously power-gated to completely eliminate standby power and maximally reduce operation power by only activating the currently accessed circuit blocks. The operations of a prototype chip at 40 MHz are demonstrated by measurement. The average operation power is only 130 µW, and the circuit density is less than 11 µm2/bit. Compared with the latest conventional works in both volatile and nonvolatile approaches, more than 31.3% circuit area reductions and 99.2% power improvements are achieved, respectively. Further power performance analyses are discussed, which verify the special superiority of the proposed NV-AM in low-power and large-memory-based VLSIs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nash, T.; Atac, R.; Cook, A.
1989-03-06
The ACPMAPS multipocessor is a highly cost effective, local memory parallel computer with a hypercube or compound hypercube architecture. Communication requires the attention of only the two communicating nodes. The design is aimed at floating point intensive, grid like problems, particularly those with extreme computing requirements. The processing nodes of the system are single board array processors, each with a peak power of 20 Mflops, supported by 8 Mbytes of data and 2 Mbytes of instruction memory. The system currently being assembled has a peak power of 5 Gflops. The nodes are based on the Weitek XL Chip set. Themore » system delivers performance at approximately $300/Mflop. 8 refs., 4 figs.« less
Frequency domain model for analysis of paralleled, series-output-connected Mapham inverters
NASA Technical Reports Server (NTRS)
Brush, Andrew S.; Sundberg, Richard C.; Button, Robert M.
1989-01-01
The Mapham resonant inverter is characterized as a two-port network driven by a selected periodic voltage. The two-port model is then used to model a pair of Mapham inverters connected in series and employing phasor voltage regulation. It is shown that the model is useful for predicting power output in paralleled inverter units, and for predicting harmonic current output of inverter pairs, using standard power flow techniques. Some sample results are compared to data obtained from testing hardware inverters.
Frequency domain model for analysis of paralleled, series-output-connected Mapham inverters
NASA Technical Reports Server (NTRS)
Brush, Andrew S.; Sundberg, Richard C.; Button, Robert M.
1989-01-01
The Mapham resonant inverter is characterized as a two-port network driven by a selected periodic voltage. The two-port model is then used to model a pair of Mapham inverters connected in series and employing phasor voltage regulation. It is shown that the model is useful for predicting power output in paralleled inverter units, and for predicting harmonic current output of inverter pairs, using standard power flow techniques. Some examples are compared to data obtained from testing hardware inverters.
GPU accelerated study of heat transfer and fluid flow by lattice Boltzmann method on CUDA
NASA Astrophysics Data System (ADS)
Ren, Qinlong
Lattice Boltzmann method (LBM) has been developed as a powerful numerical approach to simulate the complex fluid flow and heat transfer phenomena during the past two decades. As a mesoscale method based on the kinetic theory, LBM has several advantages compared with traditional numerical methods such as physical representation of microscopic interactions, dealing with complex geometries and highly parallel nature. Lattice Boltzmann method has been applied to solve various fluid behaviors and heat transfer process like conjugate heat transfer, magnetic and electric field, diffusion and mixing process, chemical reactions, multiphase flow, phase change process, non-isothermal flow in porous medium, microfluidics, fluid-structure interactions in biological system and so on. In addition, as a non-body-conformal grid method, the immersed boundary method (IBM) could be applied to handle the complex or moving geometries in the domain. The immersed boundary method could be coupled with lattice Boltzmann method to study the heat transfer and fluid flow problems. Heat transfer and fluid flow are solved on Euler nodes by LBM while the complex solid geometries are captured by Lagrangian nodes using immersed boundary method. Parallel computing has been a popular topic for many decades to accelerate the computational speed in engineering and scientific fields. Today, almost all the laptop and desktop have central processing units (CPUs) with multiple cores which could be used for parallel computing. However, the cost of CPUs with hundreds of cores is still high which limits its capability of high performance computing on personal computer. Graphic processing units (GPU) is originally used for the computer video cards have been emerged as the most powerful high-performance workstation in recent years. Unlike the CPUs, the cost of GPU with thousands of cores is cheap. For example, the GPU (GeForce GTX TITAN) which is used in the current work has 2688 cores and the price is only 1,000 US dollars. The release of NVIDIA's CUDA architecture which includes both hardware and programming environment in 2007 makes GPU computing attractive. Due to its highly parallel nature, lattice Boltzmann method is successfully ported into GPU with a performance benefit during the recent years. In the current work, LBM CUDA code is developed for different fluid flow and heat transfer problems. In this dissertation, lattice Boltzmann method and immersed boundary method are used to study natural convection in an enclosure with an array of conduting obstacles, double-diffusive convection in a vertical cavity with Soret and Dufour effects, PCM melting process in a latent heat thermal energy storage system with internal fins, mixed convection in a lid-driven cavity with a sinusoidal cylinder, and AC electrothermal pumping in microfluidic systems on a CUDA computational platform. It is demonstrated that LBM is an efficient method to simulate complex heat transfer problems using GPU on CUDA.
NASA Technical Reports Server (NTRS)
Wilson, T. G.; Lee, F. C. Y.; Burns, W. W., III; Owen, H. A., Jr.
1975-01-01
It recently has been shown in the literature that many dc-to-square-wave parallel inverters which are widely used in power-conditioning applications can be grouped into one of two families. Each family is characterized by an equivalent RLC network. Based on this approach, a classification procedure is presented for self-oscillating parallel inverters which makes evident natural relationships which exist between various inverter configurations. By utilizing concepts from the basic theory of negative resistance oscillators and the principle of duality as applied to nonlinear networks, a chain of relationships is established which enables a methodical transfer of knowledge gained about one family of inverters to any of the other families in the classification array.
Tankam, Patrice; Santhanam, Anand P.; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P.
2014-01-01
Abstract. Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing. PMID:24695868
Tankam, Patrice; Santhanam, Anand P; Lee, Kye-Sung; Won, Jungeun; Canavesi, Cristina; Rolland, Jannick P
2014-07-01
Gabor-domain optical coherence microscopy (GD-OCM) is a volumetric high-resolution technique capable of acquiring three-dimensional (3-D) skin images with histological resolution. Real-time image processing is needed to enable GD-OCM imaging in a clinical setting. We present a parallelized and scalable multi-graphics processing unit (GPU) computing framework for real-time GD-OCM image processing. A parallelized control mechanism was developed to individually assign computation tasks to each of the GPUs. For each GPU, the optimal number of amplitude-scans (A-scans) to be processed in parallel was selected to maximize GPU memory usage and core throughput. We investigated five computing architectures for computational speed-up in processing 1000×1000 A-scans. The proposed parallelized multi-GPU computing framework enables processing at a computational speed faster than the GD-OCM image acquisition, thereby facilitating high-speed GD-OCM imaging in a clinical setting. Using two parallelized GPUs, the image processing of a 1×1×0.6 mm3 skin sample was performed in about 13 s, and the performance was benchmarked at 6.5 s with four GPUs. This work thus demonstrates that 3-D GD-OCM data may be displayed in real-time to the examiner using parallelized GPU processing.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-04
... the powerhouse to Sierra Pacific Power's Fort Churchill generating station, parallel to an existing... powerhouse to the PDCI and then parallel to the PDCI to the Fort Churchill generating station; and (9...
Ideologies of aid, practices of power: lessons for Medicaid managed care.
Nelson, Nancy L
2005-03-01
The articles in this special issue teach valuable lessons based on what happened in New Mexico with the shift to Medicaid managed care. By reframing these lessons in broader historical and cultural terms with reference to aid programs, we have the opportunity to learn a great deal more about the relationship between poverty, public policy, and ideology. Medicaid as a state and federal aid program in the United States and economic development programs as foreign aid provide useful analogies specifically because they exhibit a variety of parallel patterns. The increasing concatenation of corporate interests with state and nongovernmental interests in aid programs is ultimately producing a less centralized system of power and responsibility. This process of decentralization, however, is not undermining the sources of power behind aid efforts, although it does make the connections between intent, planning, and outcome less direct. Ultimately, the devolution of power produces many unintended consequences for aid policy. But it also reinforces the perspective that aid and the need for it are nonpolitical issues.
Integration Testing of a Modular Discharge Supply for NASA's High Voltage Hall Accelerator Thruster
NASA Technical Reports Server (NTRS)
Pinero, Luis R.; Kamhawi, hani; Drummond, Geoff
2010-01-01
NASA s In-Space Propulsion Technology Program is developing a high performance Hall thruster that can fulfill the needs of future Discovery-class missions. The result of this effort is the High Voltage Hall Accelerator thruster that can operate over a power range from 0.3 to 3.5 kW and a specific impulse from 1,000 to 2,800 sec, and process 300 kg of xenon propellant. Simultaneously, a 4.0 kW discharge power supply comprised of two parallel modules was developed. These power modules use an innovative three-phase resonant topology that can efficiently supply full power to the thruster at an output voltage range of 200 to 700 V at an input voltage range of 80 to 160 V. Efficiencies as high as 95.9 percent were measured during an integration test with the NASA103M.XL thruster. The accuracy of the master/slave current sharing circuit and various thruster ignition techniques were evaluated.
RF system calibration for global Q matrix determination.
Padormo, Francesco; Beqiri, Arian; Malik, Shaihan J; Hajnal, Joseph V
2016-06-01
The use of multiple transmission channels (known as Parallel Transmission, or PTx) provides increased control of the MRI signal formation process. This extra flexibility comes at a cost of uncertainty of the power deposited in the patient under examination: the electric fields produced by each transmitter can interfere in such a way to lead to excessively high heating. Although it is not possible to determine local heating, the global Q matrix (which allows the whole-body Specific Absorption Rate (SAR) to be known for any PTx pulse) can be measured in-situ by monitoring the power incident upon and reflected by each transmit element during transmission. Recent observations have shown that measured global Q matrices can be corrupted by losses between the coil array and location of power measurement. In this work we demonstrate that these losses can be accounted for, allowing accurate global Q matrix measurement independent of the location of the power measurement devices. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
An Improved Harmonic Current Detection Method Based on Parallel Active Power Filter
NASA Astrophysics Data System (ADS)
Zeng, Zhiwu; Xie, Yunxiang; Wang, Yingpin; Guan, Yuanpeng; Li, Lanfang; Zhang, Xiaoyu
2017-05-01
Harmonic detection technology plays an important role in the applications of active power filter. The accuracy and real-time performance of harmonic detection are the precondition to ensure the compensation performance of Active Power Filter (APF). This paper proposed an improved instantaneous reactive power harmonic current detection algorithm. The algorithm uses an improved ip -iq algorithm which is combined with the moving average value filter. The proposed ip -iq algorithm can remove the αβ and dq coordinate transformation, decreasing the cost of calculation, simplifying the extraction process of fundamental components of load currents, and improving the detection speed. The traditional low-pass filter is replaced by the moving average filter, detecting the harmonic currents more precisely and quickly. Compared with the traditional algorithm, the THD (Total Harmonic Distortion) of the grid currents is reduced from 4.41% to 3.89% for the simulations and from 8.50% to 4.37% for the experiments after the improvement. The results show the proposed algorithm is more accurate and efficient.
Xray: N-dimensional, labeled arrays for analyzing physical datasets in Python
NASA Astrophysics Data System (ADS)
Hoyer, S.
2015-12-01
Efficient analysis of geophysical datasets requires tools that both preserve and utilize metadata, and that transparently scale to process large datas. Xray is such a tool, in the form of an open source Python library for analyzing the labeled, multi-dimensional array (tensor) datasets that are ubiquitous in the Earth sciences. Xray's approach pairs Python data structures based on the data model of the netCDF file format with the proven design and user interface of pandas, the popular Python data analysis library for labeled tabular data. On top of the NumPy array, xray adds labeled dimensions (e.g., "time") and coordinate values (e.g., "2015-04-10"), which it uses to enable a host of operations powered by these labels: selection, aggregation, alignment, broadcasting, split-apply-combine, interoperability with pandas and serialization to netCDF/HDF5. Many of these operations are enabled by xray's tight integration with pandas. Finally, to allow for easy parallelism and to enable its labeled data operations to scale to datasets that does not fit into memory, xray integrates with the parallel processing library dask.
Embedded Implementation of VHR Satellite Image Segmentation
Li, Chao; Balla-Arabé, Souleymane; Ginhac, Dominique; Yang, Fan
2016-01-01
Processing and analysis of Very High Resolution (VHR) satellite images provide a mass of crucial information, which can be used for urban planning, security issues or environmental monitoring. However, they are computationally expensive and, thus, time consuming, while some of the applications, such as natural disaster monitoring and prevention, require high efficiency performance. Fortunately, parallel computing techniques and embedded systems have made great progress in recent years, and a series of massively parallel image processing devices, such as digital signal processors or Field Programmable Gate Arrays (FPGAs), have been made available to engineers at a very convenient price and demonstrate significant advantages in terms of running-cost, embeddability, power consumption flexibility, etc. In this work, we designed a texture region segmentation method for very high resolution satellite images by using the level set algorithm and the multi-kernel theory in a high-abstraction C environment and realize its register-transfer level implementation with the help of a new proposed high-level synthesis-based design flow. The evaluation experiments demonstrate that the proposed design can produce high quality image segmentation with a significant running-cost advantage. PMID:27240370
Atoche, Alejandro Castillo; Castillo, Javier Vázquez
2012-01-01
A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964
NASA Astrophysics Data System (ADS)
Voter, Arthur
Many important materials processes take place on time scales that far exceed the roughly one microsecond accessible to molecular dynamics simulation. Typically, this long-time evolution is characterized by a succession of thermally activated infrequent events involving defects in the material. In the accelerated molecular dynamics (AMD) methodology, known characteristics of infrequent-event systems are exploited to make reactive events take place more frequently, in a dynamically correct way. For certain processes, this approach has been remarkably successful, offering a view of complex dynamical evolution on time scales of microseconds, milliseconds, and sometimes beyond. We have recently made advances in all three of the basic AMD methods (hyperdynamics, parallel replica dynamics, and temperature accelerated dynamics (TAD)), exploiting both algorithmic advances and novel parallelization approaches. I will describe these advances, present some examples of our latest results, and discuss what should be possible when exascale computing arrives in roughly five years. Funded by the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, and by the Los Alamos Laboratory Directed Research and Development program.
New 2D diffraction model and its applications to terahertz parallel-plate waveguide power splitters
Zhang, Fan; Song, Kaijun; Fan, Yong
2017-01-01
A two-dimensional (2D) diffraction model for the calculation of the diffraction field in 2D space and its applications to terahertz parallel-plate waveguide power splitters are proposed in this paper. Compared with the Huygens-Fresnel principle in three-dimensional (3D) space, the proposed model provides an approximate analytical expression to calculate the diffraction field in 2D space. The diffraction filed is regarded as the superposition integral in 2D space. The calculated results obtained from the proposed diffraction model agree well with the ones by software HFSS based on the element method (FEM). Based on the proposed 2D diffraction model, two parallel-plate waveguide power splitters are presented. The splitters consist of a transmitting horn antenna, reflectors, and a receiving antenna array. The reflector is cylindrical parabolic with superimposed surface relief to efficiently couple the transmitted wave into the receiving antenna array. The reflector is applied as computer-generated holograms to match the transformed field to the receiving antenna aperture field. The power splitters were optimized by a modified real-coded genetic algorithm. The computed results of the splitters agreed well with the ones obtained by software HFSS verify the novel design method for power splitter, which shows good applied prospects of the proposed 2D diffraction model. PMID:28181514
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2004-12-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2005-01-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Crosetto, D.B.
1996-12-31
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor to a plurality of slave processors to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor`s status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer, a digital signal processor, a parallel transfer controller, and two three-port memory devices. A communication switch within each node connects it to a fast parallel hardware channel through which all high density data arrives or leaves the node. 6 figs.
Crosetto, Dario B.
1996-01-01
The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Bedez, Mathieu; Belhachmi, Zakaria; Haeberlé, Olivier; Greget, Renaud; Moussaoui, Saliha; Bouteiller, Jean-Marie; Bischoff, Serge
2016-01-15
The resolution of a model describing the electrical activity of neural tissue and its propagation within this tissue is highly consuming in term of computing time and requires strong computing power to achieve good results. In this study, we present a method to solve a model describing the electrical propagation in neuronal tissue, using parareal algorithm, coupling with parallelization space using CUDA in graphical processing unit (GPU). We applied the method of resolution to different dimensions of the geometry of our model (1-D, 2-D and 3-D). The GPU results are compared with simulations from a multi-core processor cluster, using message-passing interface (MPI), where the spatial scale was parallelized in order to reach a comparable calculation time than that of the presented method using GPU. A gain of a factor 100 in term of computational time between sequential results and those obtained using the GPU has been obtained, in the case of 3-D geometry. Given the structure of the GPU, this factor increases according to the fineness of the geometry used in the computation. To the best of our knowledge, it is the first time such a method is used, even in the case of neuroscience. Parallelization time coupled with GPU parallelization space allows for drastically reducing computational time with a fine resolution of the model describing the propagation of the electrical signal in a neuronal tissue. Copyright © 2015 Elsevier B.V. All rights reserved.
Super and parallel computers and their impact on civil engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamat, M.P.
1986-01-01
This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
Traditional Tracking with Kalman Filter on Parallel Architectures
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; MacNeill, Ian; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2015-05-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this, we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The most common track finding techniques in use today are however those based on the Kalman Filter. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. We report the results of our investigations into the potential and limitations of these algorithms on the new parallel hardware.
NASA Technical Reports Server (NTRS)
Hsia, T. C.; Lu, G. Z.; Han, W. H.
1987-01-01
In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Performance evaluation of canny edge detection on a tiled multicore architecture
NASA Astrophysics Data System (ADS)
Brethorst, Andrew Z.; Desai, Nehal; Enright, Douglas P.; Scrofano, Ronald
2011-01-01
In the last few years, a variety of multicore architectures have been used to parallelize image processing applications. In this paper, we focus on assessing the parallel speed-ups of different Canny edge detection parallelization strategies on the Tile64, a tiled multicore architecture developed by the Tilera Corporation. Included in these strategies are different ways Canny edge detection can be parallelized, as well as differences in data management. The two parallelization strategies examined were loop-level parallelism and domain decomposition. Loop-level parallelism is achieved through the use of OpenMP,1 and it is capable of parallelization across the range of values over which a loop iterates. Domain decomposition is the process of breaking down an image into subimages, where each subimage is processed independently, in parallel. The results of the two strategies show that for the same number of threads, programmer implemented, domain decomposition exhibits higher speed-ups than the compiler managed, loop-level parallelism implemented with OpenMP.
Use of parallel computing in mass processing of laser data
NASA Astrophysics Data System (ADS)
Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.
2015-12-01
The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
CONTAMINANT TRANSPORT IN PARALLEL FRACTURED MEDIA: SUDICKY AND FRIND REVISITED
This paper is concerned with a modified, nondimensional form of the parallel fracture, contaminant transport model of Sudicky and Frind (1982). The modifications include the boundary condition at the fracture wall, expressed by a parameter, and the power-law relationship between...
CONTAMINANT TRANSPORT IN PARALLEL FRACTURED MEDIA: SUDICKY AND FRIND REVISITED
This paper is concerned with a modified, nondimensional form of the parallel fracture, contaminant transport model of Sudicky and Frind (1982). The modifications include the boundary condition at the fracture wall, expressed by a parameter , and the power-law relationship betwe...
An Advanced Framework for Improving Situational Awareness in Electric Power Grid Operation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Huang, Zhenyu; Zhou, Ning
With the deployment of new smart grid technologies and the penetration of renewable energy in power systems, significant uncertainty and variability is being introduced into power grid operation. Traditionally, the Energy Management System (EMS) operates the power grid in a deterministic mode, and thus will not be sufficient for the future control center in a stochastic environment with faster dynamics. One of the main challenges is to improve situational awareness. This paper reviews the current status of power grid operation and presents a vision of improving wide-area situational awareness for a future control center. An advanced framework, consisting of parallelmore » state estimation, state prediction, parallel contingency selection, parallel contingency analysis, and advanced visual analytics, is proposed to provide capabilities needed for better decision support by utilizing high performance computing (HPC) techniques and advanced visual analytic techniques. Research results are presented to support the proposed vision and framework.« less
NASA Technical Reports Server (NTRS)
Jeffries, K. S.; Renz, D. D.
1984-01-01
A parametric analysis was performed of transmission cables for transmitting electrical power at high voltage (up to 1000 V) and high frequency (10 to 30 kHz) for high power (100 kW or more) space missions. Large diameter (5 to 30 mm) hollow conductors were considered in closely spaced coaxial configurations and in parallel lines. Formulas were derived to calculate inductance and resistance for these conductors. Curves of cable conductance, mass, inductance, capacitance, resistance, power loss, and temperature were plotted for various conductor diameters, conductor thickness, and alternating current frequencies. An example 5 mm diameter coaxial cable with 0.5 mm conductor thickness was calculated to transmit 100 kW at 1000 Vac, 50 m with a power loss of 1900 W, an inductance of 1.45 micron and a capacitance of 0.07 micron-F. The computer programs written for this analysis are listed in the appendix.
Power, Patriarchy, and Punishment in Shakespeare's "Othello."
ERIC Educational Resources Information Center
Lynch, Kimberly
An informal survey revealed that graduate students presented with Shakespeare's works felt academically unfit and powerless. These student-teacher-text power relationships parallel the power relationships between the dominant patriarchy and the female characters in "Othello"--Desdemona, Emilia, and Bianca. However, "Antony and…
Experimental implementation of array-compressed parallel transmission at 7 tesla.
Yan, Xinqiang; Cao, Zhipeng; Grissom, William A
2016-06-01
To implement and validate a hardware-based array-compressed parallel transmission (acpTx) system. In array-compressed parallel transmission, a small number of transmit channels drive a larger number of transmit coils, which are connected via an array compression network that implements optimized coil-to-channel combinations. A two channel-to-eight coil array compression network was developed using power splitters, attenuators and phase shifters, and a simulation was performed to investigate the effects of coil coupling on power dissipation in a simplified network. An eight coil transmit array was constructed using induced current elimination decoupling, and the coil and network were validated in benchtop measurements, B1+ mapping scans, and an accelerated spiral excitation experiment. The developed attenuators came within 0.08 dB of the desired attenuations, and reflection coefficients were -22 dB or better. The simulation demonstrated that up to 3× more power was dissipated in the network when coils were poorly isolated (-9.6 dB), versus well-isolated (-31 dB). Compared to split circularly-polarized coil combinations, the additional degrees of freedom provided by the array compression network led to 54% lower squared excitation error in the spiral experiment. Array-compressed parallel transmission was successfully implemented in a hardware system. Further work is needed to develop remote network tuning and to minimize network power dissipation. Magn Reson Med 75:2545-2552, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
High-Efficiency Hall Thruster Discharge Power Converter
NASA Technical Reports Server (NTRS)
Jaquish, Thomas
2015-01-01
Busek Company, Inc., is designing, building, and testing a new printed circuit board converter. The new converter consists of two series or parallel boards (slices) intended to power a high-voltage Hall accelerator (HiVHAC) thruster or other similarly sized electric propulsion devices. The converter accepts 80- to 160-V input and generates 200- to 700-V isolated output while delivering continually adjustable 300-W to 3.5-kW power. Busek built and demonstrated one board that achieved nearly 94 percent efficiency the first time it was turned on, with projected efficiency exceeding 97 percent following timing software optimization. The board has a projected specific mass of 1.2 kg/kW, achieved through high-frequency switching. In Phase II, Busek optimized to exceed 97 percent efficiency and built a second prototype in a form factor more appropriate for flight. This converter then was integrated with a set of upgraded existing boards for powering magnets and the cathode. The program culminated with integrating the entire power processing unit and testing it on a Busek thruster and on NASA's HiVHAC thruster.
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units
USDA-ARS?s Scientific Manuscript database
This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
On-board landmark navigation and attitude reference parallel processor system
NASA Technical Reports Server (NTRS)
Gilbert, L. E.; Mahajan, D. T.
1978-01-01
An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.
Interconnect-free parallel logic circuits in a single mechanical resonator
Mahboob, I.; Flurin, E.; Nishiguchi, K.; Fujiwara, A.; Yamaguchi, H.
2011-01-01
In conventional computers, wiring between transistors is required to enable the execution of Boolean logic functions. This has resulted in processors in which billions of transistors are physically interconnected, which limits integration densities, gives rise to huge power consumption and restricts processing speeds. A method to eliminate wiring amongst transistors by condensing Boolean logic into a single active element is thus highly desirable. Here, we demonstrate a novel logic architecture using only a single electromechanical parametric resonator into which multiple channels of binary information are encoded as mechanical oscillations at different frequencies. The parametric resonator can mix these channels, resulting in new mechanical oscillation states that enable the construction of AND, OR and XOR logic gates as well as multibit logic circuits. Moreover, the mechanical logic gates and circuits can be executed simultaneously, giving rise to the prospect of a parallel logic processor in just a single mechanical resonator. PMID:21326230
Interconnect-free parallel logic circuits in a single mechanical resonator.
Mahboob, I; Flurin, E; Nishiguchi, K; Fujiwara, A; Yamaguchi, H
2011-02-15
In conventional computers, wiring between transistors is required to enable the execution of Boolean logic functions. This has resulted in processors in which billions of transistors are physically interconnected, which limits integration densities, gives rise to huge power consumption and restricts processing speeds. A method to eliminate wiring amongst transistors by condensing Boolean logic into a single active element is thus highly desirable. Here, we demonstrate a novel logic architecture using only a single electromechanical parametric resonator into which multiple channels of binary information are encoded as mechanical oscillations at different frequencies. The parametric resonator can mix these channels, resulting in new mechanical oscillation states that enable the construction of AND, OR and XOR logic gates as well as multibit logic circuits. Moreover, the mechanical logic gates and circuits can be executed simultaneously, giving rise to the prospect of a parallel logic processor in just a single mechanical resonator.
Shifts in information processing level: the speed theory of intelligence revisited.
Sircar, S S
2000-06-01
A hypothesis is proposed here to reconcile the inconsistencies observed in the IQ-P3 latency relation. The hypothesis stems from the observation that task-induced increase in P3 latency correlates positively with IQ scores. It is hypothesised that: (a) there are several parallel information processing pathways of varying complexity which are associated with the generation of P3 waves of varying latencies; (b) with increasing workload, there is a shift in the 'information processing level' through progressive recruitment of more complex polysynaptic pathways with greater processing power and inhibition of the oligosynaptic pathways; (c) high-IQ subjects have a greater reserve of higher level processing pathways; (d) a given 'task-load' imposes a greater 'mental workload' in subjects with lower IQ than in those with higher IQ. According to this hypothesis, a meaningful comparison of the P3 correlates of IQ is possible only when the information processing level is pushed to its limits.
Power transformation for enhancing responsiveness of quality of life questionnaire.
Zhou, YanYan Ange
2015-01-01
We investigate the effect of power transformation of raw scores on the responsiveness of quality of life survey. The procedure maximizes the paired t-test value on the power transformed data to obtain an optimal power range. The parallel between the Box-Cox transformation is also investigated for the quality of life data.
Conceptual Design and Optimal Power Control Strategy for AN Eco-Friendly Hybrid Vehicle
NASA Astrophysics Data System (ADS)
Nasiri, N. Mir; Chieng, Frederick T. A.
2011-06-01
This paper presents a new concept for a hybrid vehicle using a torque and speed splitting technique. It is implemented by the newly developed controller in combination with a two degree of freedom epicyclic gear transmission. This approach enables optimization of the power split between the less powerful electrical motor and more powerful engine while driving a car load. The power split is fundamentally a dual-energy integration mechanism as it is implemented by using the epicyclic gear transmission that has two inputs and one output for a proper power distribution. The developed power split control system manages the operation of both the inputs to have a known output with the condition of maintaining optimum operating efficiency of the internal combustion engine and electrical motor. This system has a huge potential as it is possible to integrate all the features of hybrid vehicle known to-date such as the regenerative braking system, series hybrid, parallel hybrid, series/parallel hybrid, and even complex hybrid (bidirectional). By using the new power split system it is possible to further reduce fuel consumption and increase overall efficiency.
Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Shuaiwen; Tallent, Nathan R.; Vishnu, Abhinav
2013-09-23
Future exascale systems must be optimized for both power and performance at scale in order to achieve DOE’s goal of a sustained petaflop within 20 Megawatts by 2022 [1]. Massive parallelism of the future systems combined with complex memory hierarchies will form a barrier to efficient application and architecture design. These challenges are exacerbated with emerging complex architectures such as GPGPUs and Intel Xeon Phi as parallelism increases orders of magnitude and system power consumption can easily triple or quadruple. Therefore, we need techniques that can reduce the search space for optimization, isolate power-performance bottlenecks, identify root causes for software/hardwaremore » inefficiency, and effectively direct runtime scheduling.« less
Li-Ion Battery and Supercapacitor Hybrid Design for Long Extravehicular Activities
NASA Technical Reports Server (NTRS)
Jeevarajan, Judith
2013-01-01
With the need for long periods of extravehicular activities (EVAs) on the Moon or Mars or a near-asteroid, the need for long-performance batteries has increased significantly. The energy requirements for the EVA suit, as well as surface systems such as rovers, have increased significantly due to the number of applications they need to power at the same time. However, even with the best state-of-the-art Li-ion batteries, it is not possible to power the suit or the rovers for the extended period of performance. Carrying a charging system along with the batteries makes it cumbersome and requires a self-contained power source for the charging system that is usually not possible. An innovative method to charge and use the Li-ion batteries for long periods seems to be necessary and hence, with the advent of the Li-ion supercapacitors, a method has been developed to extend the performance period of the Li-ion power system for future exploration applications. The Li-ion supercapacitors have a working voltage range of 3.8 to 2.5 V, and are different from a traditional supercapacitor that typically has a working voltage of 1 V. The innovation is to use this Li-ion supercapacitor to charge Liion battery systems on an as-needed basis. The supercapacitors are charged using solar arrays and have battery systems of low capacity in parallel to be able to charge any one battery system while they provide power to the application. Supercapacitors can safely take up fast charge since the electrochemical process involved is still based on charge separation rather than the intercalation process seen in Li-ion batteries, thus preventing lithium metal deposition on the anodes. The lack of intercalation and eliminating wear of the supercapacitors allows for them to be charged and discharged safely for a few tens of thousands of cycles. The Li-ion supercapacitors can be charged from the solar cells during the day during an extended EVA. The Liion battery used can be half the capacity required for a nominal EVA. The small Li-ion battery can be divided into two parallel modules with independent charging ports that would allow the supercapacitors to charge one battery while the other is providing power to the rover or suit.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Grider, Gary A.; Poole, Stephen W.
2015-09-01
Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
schwimmbad: A uniform interface to parallel processing pools in Python
NASA Astrophysics Data System (ADS)
Price-Whelan, Adrian M.; Foreman-Mackey, Daniel
2017-09-01
Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).
Monocrystalline CVD-diamond optics for high-power laser applications
NASA Astrophysics Data System (ADS)
Holly, C.; Traub, M.; Hoffmann, D.; Widmann, C.; Brink, D.; Nebel, C.; Gotthardt, T.; Sözbir, M. C.; Wenzel, C.
2016-03-01
The potential of diamond as an optical material for high-power laser applications in the wavelength regime from the visible spectrum (VIS) to the near infrared (NIR) is investigated. Single-crystal diamonds with lateral dimensions up to 7×7mm2 are grown with microwave plasma assisted chemical vapor deposition (MPACVD) in parallel with up to 60 substrates and are further processed to spherical optics for beam guidance and shaping. The synthetic diamonds offer superior thermal, mechanical and optical properties, including low birefringence, scattering and absorption, also around 1 μm wavelength. We present dielectric (AR and HR) coated single-crystal diamond optics which are tested under high laser power in the multi-kW regime. The thermally induced focal shift of the diamond substrates is compared to the focal shift of a standard collimating and focusing unit for laser cutting made of fused silica optics. Due to the high thermal conductivity and low absorption of the diamond substrates compared to the fused silica optics no additional focal shift caused by a thermally induced refractive index change in the diamond is observed in our experiments. We present experimental results regarding the performance of the diamond substrates with and without dielectric coatings under high power and the influences of growth induced birefringence on the optical quality. Finally, we discuss the potential of the presented diamond lenses for high-power applications in the field of laser materials processing.
An algorithm for power line detection and warning based on a millimeter-wave radar video.
Ma, Qirong; Goshi, Darren S; Shih, Yi-Chi; Sun, Ming-Ting
2011-12-01
Power-line-strike accident is a major safety threat for low-flying aircrafts such as helicopters, thus an automatic warning system to power lines is highly desirable. In this paper we propose an algorithm for detecting power lines from radar videos from an active millimeter-wave sensor. Hough Transform is employed to detect candidate lines. The major challenge is that the radar videos are very noisy due to ground return. The noise points could fall on the same line which results in signal peaks after Hough Transform similar to the actual cable lines. To differentiate the cable lines from the noise lines, we train a Support Vector Machine to perform the classification. We exploit the Bragg pattern, which is due to the diffraction of electromagnetic wave on the periodic surface of power lines. We propose a set of features to represent the Bragg pattern for the classifier. We also propose a slice-processing algorithm which supports parallel processing, and improves the detection of cables in a cluttered background. Lastly, an adaptive algorithm is proposed to integrate the detection results from individual frames into a reliable video detection decision, in which temporal correlation of the cable pattern across frames is used to make the detection more robust. Extensive experiments with real-world data validated the effectiveness of our cable detection algorithm. © 2011 IEEE
High power parallel ultrashort pulse laser processing
NASA Astrophysics Data System (ADS)
Gillner, Arnold; Gretzki, Patrick; Büsing, Lasse
2016-03-01
The class of ultra-short-pulse (USP) laser sources are used, whenever high precession and high quality material processing is demanded. These laser sources deliver pulse duration in the range of ps to fs and are characterized with high peak intensities leading to a direct vaporization of the material with a minimum thermal damage. With the availability of industrial laser source with an average power of up to 1000W, the main challenge consist of the effective energy distribution and disposition. Using lasers with high repetition rates in the MHz region can cause thermal issues like overheating, melt production and low ablation quality. In this paper, we will discuss different approaches for multibeam processing for utilization of high pulse energies. The combination of diffractive optics and conventional galvometer scanner can be used for high throughput laser ablation, but are limited in the optical qualities. We will show which applications can benefit from this hybrid optic and which improvements in productivity are expected. In addition, the optical limitations of the system will be compiled, in order to evaluate the suitability of this approach for any given application.
Montagna, Fabio; Buiatti, Marco; Benatti, Simone; Rossi, Davide; Farella, Elisabetta; Benini, Luca
2017-10-01
EEG is a standard non-invasive technique used in neural disease diagnostics and neurosciences. Frequency-tagging is an increasingly popular experimental paradigm that efficiently tests brain function by measuring EEG responses to periodic stimulation. Recently, frequency-tagging paradigms have proven successful with low stimulation frequencies (0.5-6Hz), but the EEG signal is intrinsically noisy in this frequency range, requiring heavy signal processing and significant human intervention for response estimation. This limits the possibility to process the EEG on resource-constrained systems and to design smart EEG based devices for automated diagnostic. We propose an algorithm for artifact removal and automated detection of frequency tagging responses in a wide range of stimulation frequencies, which we test on a visual stimulation protocol. The algorithm is rooted on machine learning based pattern recognition techniques and it is tailored for a new generation parallel ultra low power processing platform (PULP), reaching performance of more that 90% accuracy in the frequency detection even for very low stimulation frequencies (<1Hz) with a power budget of 56mW. Copyright © 2017 Elsevier Inc. All rights reserved.
Design and evaluation of cellular power converter architectures
NASA Astrophysics Data System (ADS)
Perreault, David John
Power electronic technology plays an important role in many energy conversion and storage applications, including machine drives, power supplies, frequency changers and UPS systems. Increases in performance and reductions in cost have been achieved through the development of higher performance power semiconductor devices and integrated control devices with increased functionality. Manufacturing techniques, however, have changed little. High power is typically achieved by paralleling multiple die in a sing!e package, producing the physical equivalent of a single large device. Consequently, both the device package and the converter in which the device is used continue to require large, complex mechanical structures, and relatively sophisticated heat transfer systems. An alternative to this approach is the use of a cellular power converter architecture, which is based upon the parallel connection of a large number of quasi-autonomous converters, called cells, each of which is designed for a fraction of the system rating. The cell rating is chosen such that single-die devices in inexpensive packages can be used, and the cell fabricated with an automated assembly process. The use of quasi-autonomous cells means that system performance is not compromised by the failure of a cell. This thesis explores the design of cellular converter architectures with the objective of achieving improvements in performance, reliability, and cost over conventional converter designs. New approaches are developed and experimentally verified for highly distributed control of cellular converters, including methods for ripple cancellation and current-sharing control. The performance of these techniques are quantified, and their dynamics are analyzed. Cell topologies suitable to the cellular architecture are investigated, and their use for systems in the 5-500 kVA range is explored. The design, construction, and experimental evaluation of a 6 kW cellular switched-mode rectifier is also addressed. This cellular system implements entirely distributed control, and achieves performance levels unattainable with an equivalent single converter. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
Analysis of the thermal balance characteristics for multiple-connected piezoelectric transformers.
Park, Joung-Hu; Cho, Bo-Hyung; Choi, Sung-Jin; Lee, Sang-Min
2009-08-01
Because the amount of power that a piezoelectric transformer (PT) can handle is limited, multiple connections of PTs are necessary for the power-capacity improvement of PT-applications. In the connection, thermal imbalance between the PTs should be prevented to avoid the thermal runaway of each PT. The thermal balance of the multiple-connected PTs is dominantly affected by the electrothermal characteristics of individual PTs. In this paper, the thermal balance of both parallel-parallel and parallel-series connections are analyzed by electrical model parameters. For quantitative analysis, the thermal-balance effects are estimated by the simulation of the mechanical loss ratio between the PTs. The analysis results show that with PTs of similar characteristics, the parallel-series connection has better thermal balance characteristics due to the reduced mechanical loss of the higher temperature PT. For experimental verification of the analysis, a hardware-prototype test of a Cs-Lp type 40 W adapter system with radial-vibration mode PTs has been performed.
Power semiconductor device with negative thermal feedback
NASA Technical Reports Server (NTRS)
Borky, J. M.; Thornton, R. D.
1970-01-01
Composite power semiconductor avoids second breakdown and provides stable operation. It consists of an array of parallel-connected integrated circuits fabricated in a single chip. The output power device and associated low-level amplifier are closely coupled thermally, so that they have a predetermined temperature relationship.
Biologically inspired collision avoidance system for unmanned vehicles
NASA Astrophysics Data System (ADS)
Ortiz, Fernando E.; Graham, Brett; Spagnoli, Kyle; Kelmelis, Eric J.
2009-05-01
In this project, we collaborate with researchers in the neuroscience department at the University of Delaware to develop an Field Programmable Gate Array (FPGA)-based embedded computer, inspired by the brains of small vertebrates (fish). The mechanisms of object detection and avoidance in fish have been extensively studied by our Delaware collaborators. The midbrain optic tectum is a biological multimodal navigation controller capable of processing input from all senses that convey spatial information, including vision, audition, touch, and lateral-line (water current sensing in fish). Unfortunately, computational complexity makes these models too slow for use in real-time applications. These simulations are run offline on state-of-the-art desktop computers, presenting a gap between the application and the target platform: a low-power embedded device. EM Photonics has expertise in developing of high-performance computers based on commodity platforms such as graphic cards (GPUs) and FPGAs. FPGAs offer (1) high computational power, low power consumption and small footprint (in line with typical autonomous vehicle constraints), and (2) the ability to implement massively-parallel computational architectures, which can be leveraged to closely emulate biological systems. Combining UD's brain modeling algorithms and the power of FPGAs, this computer enables autonomous navigation in complex environments, and further types of onboard neural processing in future applications.
Parallel Signal Processing and System Simulation using aCe
NASA Technical Reports Server (NTRS)
Dorband, John E.; Aburdene, Maurice F.
2003-01-01
Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Read, S J; Vanman, E J; Miller, L C
1997-01-01
We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.
Parallel tempering for the traveling salesman problem
DOE Office of Scientific and Technical Information (OSTI.GOV)
Percus, Allon; Wang, Richard; Hyman, Jeffrey
We explore the potential of parallel tempering as a combinatorial optimization method, applying it to the traveling salesman problem. We compare simulation results of parallel tempering with a benchmark implementation of simulated annealing, and study how different choices of parameters affect the relative performance of the two methods. We find that a straightforward implementation of parallel tempering can outperform simulated annealing in several crucial respects. When parameters are chosen appropriately, both methods yield close approximation to the actual minimum distance for an instance with 200 nodes. However, parallel tempering yields more consistently accurate results when a series of independent simulationsmore » are performed. Our results suggest that parallel tempering might offer a simple but powerful alternative to simulated annealing for combinatorial optimization problems.« less
Using Parallel Processing for Problem Solving.
1979-12-01
are the basic parallel proces- sing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities...Language primitives are provided for manipulating running activities. Viewpoints are a generalization of context FOM -(over "*’ DD I FON 1473 ’EDITION OF I...arc the basic parallel processing primitive . Different goals of the system can be pursued in parallel by placing them in separate activities. Language
1988 IEEE Aerospace Applications Conference, Park City, UT, Feb. 7-12, 1988, Digest
NASA Astrophysics Data System (ADS)
The conference presents papers on microwave applications, data and signal processing applications, related aerospace applications, and advanced microelectronic products for the aerospace industry. Topics include a high-performance antenna measurement system, microwave power beaming from earth to space, the digital enhancement of microwave component performance, and a GaAs vector processor based on parallel RISC microprocessors. Consideration is also given to unique techniques for reliable SBNR architectures, a linear analysis subsystem for CSSL-IV, and a structured singular value approach to missile autopilot analysis.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Analysis of scalability of high-performance 3D image processing platform for virtual colonoscopy
NASA Astrophysics Data System (ADS)
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli
2014-03-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. For this purpose, we previously developed a software platform for high-performance 3D medical image processing, called HPC 3D-MIP platform, which employs increasingly available and affordable commodity computing systems such as the multicore, cluster, and cloud computing systems. To achieve scalable high-performance computing, the platform employed size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D-MIP algorithms, supported task scheduling for efficient load distribution and balancing, and consisted of a layered parallel software libraries that allow image processing applications to share the common functionalities. We evaluated the performance of the HPC 3D-MIP platform by applying it to computationally intensive processes in virtual colonoscopy. Experimental results showed a 12-fold performance improvement on a workstation with 12-core CPUs over the original sequential implementation of the processes, indicating the efficiency of the platform. Analysis of performance scalability based on the Amdahl's law for symmetric multicore chips showed the potential of a high performance scalability of the HPC 3DMIP platform when a larger number of cores is available.
Parallel Calculations in LS-DYNA
NASA Astrophysics Data System (ADS)
Vartanovich Mkrtychev, Oleg; Aleksandrovich Reshetov, Andrey
2017-11-01
Nowadays, structural mechanics exhibits a trend towards numeric solutions being found for increasingly extensive and detailed tasks, which requires that capacities of computing systems be enhanced. Such enhancement can be achieved by different means. E.g., in case a computing system is represented by a workstation, its components can be replaced and/or extended (CPU, memory etc.). In essence, such modification eventually entails replacement of the entire workstation, i.e. replacement of certain components necessitates exchange of others (faster CPUs and memory devices require buses with higher throughput etc.). Special consideration must be given to the capabilities of modern video cards. They constitute powerful computing systems capable of running data processing in parallel. Interestingly, the tools originally designed to render high-performance graphics can be applied for solving problems not immediately related to graphics (CUDA, OpenCL, Shaders etc.). However, not all software suites utilize video cards’ capacities. Another way to increase capacity of a computing system is to implement a cluster architecture: to add cluster nodes (workstations) and to increase the network communication speed between the nodes. The advantage of this approach is extensive growth due to which a quite powerful system can be obtained by combining not particularly powerful nodes. Moreover, separate nodes may possess different capacities. This paper considers the use of a clustered computing system for solving problems of structural mechanics with LS-DYNA software. To establish a range of dependencies a mere 2-node cluster has proven sufficient.
Shitanda, Isao; Momiyama, Misaki; Watanabe, Naoto; Tanaka, Tomohiro; Tsujimura, Seiya; Hoshi, Yoshinao; Itagaki, Masayuki
2017-10-01
A novel paper-based biofuel cell with a series/parallel array structure has been fabricated, in which the cell voltage and output power can easily be adjusted as required by printing. The output of the fabricated 4-series/4-parallel biofuel cell reached 0.97±0.02 mW at 1.4 V, which is the highest output power reported to date for a paper-based biofuel cell. This work contributes to the development of flexible, wearable energy storage device.
NASA Astrophysics Data System (ADS)
Sheynin, Yuriy; Shutenko, Felix; Suvorova, Elena; Yablokov, Evgenej
2008-04-01
High rate interconnections are important subsystems in modern data processing and control systems of many classes. They are especially important in prospective embedded and on-board systems that used to be multicomponent systems with parallel or distributed architecture, [1]. Modular architecture systems of previous generations were based on parallel busses that were widely used and standardised: VME, PCI, CompactPCI, etc. Busses evolution went in improvement of bus protocol efficiency (burst transactions, split transactions, etc.) and increasing operation frequencies. However, due to multi-drop bus nature and multi-wire skew problems the parallel bussing speedup became more and more limited. For embedded and on-board systems additional reason for this trend was in weight, size and power constraints of an interconnection and its components. Parallel interfaces have become technologically more challenging as their respective clock frequencies have increased to keep pace with the bandwidth requirements of their attached storage devices. Since each interface uses a data clock to gate and validate the parallel data (which is normally 8 bits or 16 bits wide), the clock frequency need only be equivalent to the byte rate or word rate being transmitted. In other words, for a given transmission frequency, the wider the data bus, the slower the clock. As the clock frequency increases, more high frequency energy is available in each of the data lines, and a portion of this energy is dissipated in radiation. Each data line not only transmits this energy but also receives some from its neighbours. This form of mutual interference is commonly called "cross-talk," and the signal distortion it produces can become another major contributor to loss of data integrity unless compensated by appropriate cable designs. Other transmission problems such as frequency-dependent attenuation and signal reflections, while also applicable to serial interfaces, are more troublesome in parallel interfaces due to the number of additional cable conductors involved. In order to compensate for these drawbacks, higher quality cables, shorter cable runs and fewer devices on the bus have been the norm. Finally, the physical bulk of the parallel cables makes them more difficult to route inside an enclosure, hinders cooling airflow and is incompatible with the trend toward smaller form-factor devices. Parallel busses worked in systems during the past 20 years, but the accumulated problems dictate the need for change and the technology is available to spur the transition. The general trend in high-rate interconnections turned from parallel bussing to scalable interconnections with a network architecture and high-rate point-to-point links. Analysis showed that data links with serial information transfer could achieve higher throughput and efficiency and it was confirmed in various research and practical design. Serial interfaces offer an improvement over older parallel interfaces: better performance, better scalability, and also better reliability as the parallel interfaces are at their limits of speed with reliable data transfers and others. The trend was implemented in major standards' families evolution: e.g. from PCI/PCI-X parallel bussing to PCIExpress interconnection architecture with serial lines, from CompactPCI parallel bus to ATCA (Advanced Telecommunications Architecture) specification with serial links and network topologies of an interconnection, etc. In the article we consider a general set of characteristics and features of serial interconnections, give a brief overview of serial interconnections specifications. In more details we present the SpaceWire interconnection technology. Have been developed for space on-board systems applications the SpaceWire has important features and characteristics that make it a prospective interconnection for wide range of embedded systems.
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Ion heating and flows in a high power helicon source
NASA Astrophysics Data System (ADS)
Thompson, Derek S.; Agnello, Riccardo; Furno, Ivo; Howling, Alan; Jacquier, Rémy; Plyushchev, Gennady; Scime, Earl E.
2017-06-01
We report experimental measurements of ion temperatures and flows in a high power, linear, magnetized, helicon plasma device, the Resonant Antenna Ion Device (RAID). Parallel and perpendicular ion temperatures on the order of 0.6 eV are observed for an rf power of 4 kW, suggesting that higher power helicon sources should attain ion temperatures in excess of 1 eV. The unique RAID antenna design produces broad, uniform plasma density and perpendicular ion temperature radial profiles. Measurements of the azimuthal flow indicate rigid body rotation of the plasma column of a few kHz. When configured with an expanding magnetic field, modest parallel ion flows are observed in the expansion region. The ion flows and temperatures are derived from laser induced fluorescence measurements of the Doppler resolved velocity distribution functions of argon ions.
Learning and Parallelization Boost Constraint Search
ERIC Educational Resources Information Center
Yun, Xi
2013-01-01
Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work…
Toward an automated parallel computing environment for geosciences
NASA Astrophysics Data System (ADS)
Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping
2007-08-01
Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.
2012-12-01
photovoltaic (PV) system to use a maximum power point tracker ( MPPT ) to increase... photovoltaic (PV) system to use a maximum power point tracker ( MPPT ) to increase the power output of the solar array. Currently, most military... MPPT ) is an optimizing circuit that is used in conjunction with photovoltaic (PV) arrays to achieve the maximum delivery of power from the array
NASA Astrophysics Data System (ADS)
Kim, Hyo-Seob; Dharmaiah, Peyala; Hong, Soon-Jik
2018-06-01
In this study, p-type (GeTe) x (AgSbTe2)100- x : TAGS- x (where x = 75, 80, 85, and 90) thermoelectric materials were fabricated by a combination of gas atomization and a hot-extrusion process, and the effects of chemical composition on microstructure, thermoelectric, and mechanical properties were investigated. The extruded samples exhibited higher relative densities (> 99%), and a significant orientation degree parallel to the extrusion direction with fine and homogeneous microstructure was observed. The hardness of extruded samples was around 200-260 kgf/mm2, which indicates that they have much better mechanical properties than most other TE materials. The power factor of the extruded samples showed excellent values; the maximum power factor achieved was 3.81 × 10-3 W/mK2 for TAGS-90 at 723 K due to an effective combination of the Seebeck coefficient and electrical conductivity.
Sensor readout detector circuit
Chu, Dahlon D.; Thelen, Jr., Donald C.
1998-01-01
A sensor readout detector circuit is disclosed that is capable of detecting sensor signals down to a few nanoamperes or less in a high (microampere) background noise level. The circuit operates at a very low standby power level and is triggerable by a sensor event signal that is above a predetermined threshold level. A plurality of sensor readout detector circuits can be formed on a substrate as an integrated circuit (IC). These circuits can operate to process data from an array of sensors in parallel, with only data from active sensors being processed for digitization and analysis. This allows the IC to operate at a low power level with a high data throughput for the active sensors. The circuit may be used with many different types of sensors, including photodetectors, capacitance sensors, chemically-sensitive sensors or combinations thereof to provide a capability for recording transient events or for recording data for a predetermined period of time following an event trigger. The sensor readout detector circuit has applications for portable or satellite-based sensor systems.
Sensor readout detector circuit
Chu, D.D.; Thelen, D.C. Jr.
1998-08-11
A sensor readout detector circuit is disclosed that is capable of detecting sensor signals down to a few nanoamperes or less in a high (microampere) background noise level. The circuit operates at a very low standby power level and is triggerable by a sensor event signal that is above a predetermined threshold level. A plurality of sensor readout detector circuits can be formed on a substrate as an integrated circuit (IC). These circuits can operate to process data from an array of sensors in parallel, with only data from active sensors being processed for digitization and analysis. This allows the IC to operate at a low power level with a high data throughput for the active sensors. The circuit may be used with many different types of sensors, including photodetectors, capacitance sensors, chemically-sensitive sensors or combinations thereof to provide a capability for recording transient events or for recording data for a predetermined period of time following an event trigger. The sensor readout detector circuit has applications for portable or satellite-based sensor systems. 6 figs.
Choi, Heejin; Tzeranis, Dimitrios S.; Cha, Jae Won; Clémenceau, Philippe; de Jong, Sander J. G.; van Geest, Lambertus K.; Moon, Joong Ho; Yannas, Ioannis V.; So, Peter T. C.
2012-01-01
Fluorescence and phosphorescence lifetime imaging are powerful techniques for studying intracellular protein interactions and for diagnosing tissue pathophysiology. While lifetime-resolved microscopy has long been in the repertoire of the biophotonics community, current implementations fall short in terms of simultaneously providing 3D resolution, high throughput, and good tissue penetration. This report describes a new highly efficient lifetime-resolved imaging method that combines temporal focusing wide-field multiphoton excitation and simultaneous acquisition of lifetime information in frequency domain using a nanosecond gated imager from a 3D-resolved plane. This approach is scalable allowing fast volumetric imaging limited only by the available laser peak power. The accuracy and performance of the proposed method is demonstrated in several imaging studies important for understanding peripheral nerve regeneration processes. Most importantly, the parallelism of this approach may enhance the imaging speed of long lifetime processes such as phosphorescence by several orders of magnitude. PMID:23187477
Phosphoric and electric utility fuel cell technology development
NASA Astrophysics Data System (ADS)
Breault, R. D.; Briggs, T. A.; Congdon, J. V.; Demarche, T. E.; Gelting, R. L.; Goller, G. J.; Luoma, W. I.; McCloskey, M. W.; Mientek, A. P.; Obrien, J. J.
1984-01-01
The advancement of electric utility cell stack technology and reduction of cell stack cost was initiated. The cell stack has a nominal 10 ft (2) active area and operates at 120 psia/405(0)F. The program comprises six parallel phases, which culminate in a full height, 10-ft(2) stack verification test: (1) provides the information and services needed to manage the effort, including definition of the prototype commercial power plant; (2) develops the technical base for long term improvements to the cell stack; (3) develops materials and processing techniques for cell stack components incorporating the best available technology; (4) provides the design of hardware and conceptual processing layouts, and updates the power plant definition of Phase 1 to reflect the results of Phases 2 and 3; Phase 5 manufactures the hardware to verify the achievements of Phases 2 and 3, and analyzes the cost of this hardware; and Phase 6 tests the cell stacks assembled from the hardware of Phase 5 to assess the state of development.
A Versatile Planetary Radio Science Microreceiver
NASA Technical Reports Server (NTRS)
Fry, Craig D.; Rosenberg, T. J.
1999-01-01
We have developed a low-power. programmable radio "microreceiver" that combines the functionality of two science instruments: a Relative Ionospheric Opacity Meter (riometer) and a swept-frequency, VTF/HF radio spectrometer. The radio receiver, calibration noise source, data acquisition and processing, and command and control functions are all contained on a single circuit board. This design is suitable for miniaturizing as a complete flight instrument. Several of the subsystems were implemented in a field-programmable gate array (FPGA), including the receiver detector, the control logic, and the data acquisition and processing blocks. Considerable efforts were made to reduce the power consumption of the instrument, and eliminate or minimize RF noise and spurious emissions generated by the receiver's digital circuitry. A prototype instrument was deployed at McMurdo Station, Antarctica, and operated in parallel with a traditional riometer instrument for approximately three weeks. The attached paper (accepted for publication by Radio Science) describes in detail the microreceiver theory of operation, performance specifications and test results.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Evaluation of selected chemical processes for production of low-cost silicon, phase 3
NASA Technical Reports Server (NTRS)
Blocher, J. M., Jr.; Browning, M. F.; Seifert, D. A.
1981-01-01
A Process Development Unit (PDU), which consisted of the four major units of the process, was designed, installed, and experimentally operated. The PDU was sized to 50MT/Yr. The deposition took place in a fluidized bed reactor. As a consequences of the experiments, improvements in the design an operation of these units were undertaken and their experimental limitations were partially established. A parallel program of experimental work demonstrated that Zinc can be vaporized for introduction into the fluidized bed reactor, by direct induction-coupled r.f. energy. Residual zinc in the product can be removed by heat treatment below the melting point of silicon. Current efficiencies of 94 percent and above, and power efficiencies around 40 percent are achievable in the laboratory-scale electrolysis of ZnCl2.
Light Signaling in Bud Outgrowth and Branching in Plants
Leduc, Nathalie; Roman, Hanaé; Barbier, François; Péron, Thomas; Huché-Thélier, Lydie; Lothier, Jérémy; Demotes-Mainard, Sabine; Sakr, Soulaiman
2014-01-01
Branching determines the final shape of plants, which influences adaptation, survival and the visual quality of many species. It is an intricate process that includes bud outgrowth and shoot extension, and these in turn respond to environmental cues and light conditions. Light is a powerful environmental factor that impacts multiple processes throughout plant life. The molecular basis of the perception and transduction of the light signal within buds is poorly understood and undoubtedly requires to be further unravelled. This review is based on current knowledge on bud outgrowth-related mechanisms and light-mediated regulation of many physiological processes. It provides an extensive, though not exhaustive, overview of the findings related to this field. In parallel, it points to issues to be addressed in the near future. PMID:27135502
Relativistic Collisions of Highly-Charged Ions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ionescu, Dorin; Belkacem, Ali
1998-11-19
The physics of elementary atomic processes in relativistic collisions between highly-charged ions and atoms or other ions is briefly discussed, and some recent theoretical and experimental results in this field are summarized. They include excitation, capture, ionization, and electron-positron pair creation. The numerical solution of the two-center Dirac equation in momentum space is shown to be a powerful nonperturbative method for describing atomic processes in relativistic collisions involving heavy and highly-charged ions. By propagating negative-energy wave packets in time the evolution of the QED vacuum around heavy ions in relativistic motion is investigated. Recent results obtained from numerical calculations usingmore » massively parallel processing on the Cray-T3E supercomputer of the National Energy Research Scientific Computer Center (NERSC) at Berkeley National Laboratory are presented.« less
Search asymmetries: parallel processing of uncertain sensory information.
Vincent, Benjamin T
2011-08-01
What is the mechanism underlying search phenomena such as search asymmetry? Two-stage models such as Feature Integration Theory and Guided Search propose parallel pre-attentive processing followed by serial post-attentive processing. They claim search asymmetry effects are indicative of finding pairs of features, one processed in parallel, the other in serial. An alternative proposal is that a 1-stage parallel process is responsible, and search asymmetries occur when one stimulus has greater internal uncertainty associated with it than another. While the latter account is simpler, only a few studies have set out to empirically test its quantitative predictions, and many researchers still subscribe to the 2-stage account. This paper examines three separate parallel models (Bayesian optimal observer, max rule, and a heuristic decision rule). All three parallel models can account for search asymmetry effects and I conclude that either people can optimally utilise the uncertain sensory data available to them, or are able to select heuristic decision rules which approximate optimal performance. Copyright © 2011 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Kimnach, Greg L.; Lebron, Ramon C.; Fox, David A.
1999-01-01
The John H. Glenn Research Center at Lewis Field (GRC) in Cleveland, OH and the Sundstrand Corporation in Rockford, IL have designed and developed an Engineering Model (EM) Electrical Power Control Unit (EPCU) for the Fluids Combustion Facility, (FCF) experiments to be flown on the International Space Station (ISS). The EPCU will be used as the power interface to the ISS power distribution system for the FCF's space experiments'test and telemetry hardware. Furthermore. it is proposed to be the common power interface for all experiments. The EPCU is a three kilowatt 12OVdc-to-28Vdc converter utilizing three independent Power Converter Units (PCUs), each rated at 1kWe (36Adc @ 28Vdc) which are paralleled and synchronized. Each converter may be fed from one of two ISS power channels. The 28Vdc loads are connected to the EPCU output via 48 solid-state and current-limiting switches, rated at 4Adc each. These switches may be paralleled to supply any given load up to the 108Adc normal operational limit of the paralleled converters. The EPCU was designed in this manner to maximize allocated-power utilization. to shed loads autonomously, to provide fault tolerance. and to provide a flexible power converter and control module to meet various ISS load demands. Tests of the EPCU in the Power Systems Facility testbed at GRC reveal that the overall converted-power efficiency, is approximately 89% with a nominal-input voltage of 12OVdc and a total load in the range of 4O% to 110% rated 28Vdc load. (The PCUs alone have an efficiency of approximately 94.5%). Furthermore, the EM unit passed all flight-qualification level (and beyond) vibration tests, passed ISS EMI (conducted, radiated. and susceptibility) requirements. successfully operated for extended periods in a thermal/vacuum chamber, was integrated with a proto-flight experiment and passed all stability and functional requirements.
Shibata, Yoshiyuki; Imai, Shingo; Nobutomo, Tatsuya; Miyoshi, Tasuku; Yamamoto, Shin-Ichiroh
2010-01-01
The purpose of this study is to develop a body weight support gait training system for stroke and spinal cord injury. This system consists of a powered orthosis, treadmill and equipment of body weight support. Attachment of the powered orthosis is able to fit subject who has difference of body size. This powered orthosis is driven by pneumatic McKibben actuator. Actuators are arranged as pair of antagonistic bi-articular muscle model and two pairs of antagonistic mono-articular muscle model like human musculoskeletal system. Part of the equipment of body weight support suspend subject by wire harness, and body weight of subject is supported continuously by counter weight. The powered orthosis is attached equipment of body weight support by parallel linkage, and movement of the powered orthosis is limited at sagittal plane. Weight of the powered orthosis is compensated by parallel linkage with gas-spring. In this study, we developed system that has orthosis powered by pneumatic McKibben actuators and equipment of body weight support. We report detail of our developed body weight support gait training system.
Parallel processor for real-time structural control
NASA Astrophysics Data System (ADS)
Tise, Bert L.
1993-07-01
A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
NASA Astrophysics Data System (ADS)
Brächer, T.; Pirro, P.; Hillebrands, B.
2017-06-01
Magnonics and magnon spintronics aim at the utilization of spin waves and magnons, their quanta, for the construction of wave-based logic networks via the generation of pure all-magnon spin currents and their interfacing with electric charge transport. The promise of efficient parallel data processing and low power consumption renders this field one of the most promising research areas in spintronics. In this context, the process of parallel parametric amplification, i.e., the conversion of microwave photons into magnons at one half of the microwave frequency, has proven to be a versatile tool to excite and to manipulate spin waves. Its beneficial and unique properties such as frequency and mode-selectivity, the possibility to excite spin waves in a wide wavevector range and the creation of phase-correlated wave pairs, have enabled the achievement of important milestones like the magnon Bose-Einstein condensation and the cloning and trapping of spin-wave packets. Parallel parametric amplification, which allows for the selective amplification of magnons while conserving their phase is, thus, one of the key methods of spin-wave generation and amplification. The application of parallel parametric amplification to CMOS-compatible micro- and nano-structures is an important step towards the realization of magnonic networks. This is motivated not only by the fact that amplifiers are an important tool for the construction of any extended logic network but also by the unique properties of parallel parametric amplification. In particular, the creation of phase-correlated wave pairs allows for rewarding alternative logic operations such as a phase-dependent amplification of the incident waves. Recently, the successful application of parallel parametric amplification to metallic microstructures has been reported which constitutes an important milestone for the application of magnonics in practical devices. It has been demonstrated that parametric amplification provides an excellent tool to generate and to amplify spin waves in these systems in a wide wavevector range. In particular, the amplification greatly benefits from the discreteness of the spin-wave spectra since the size of the microstructures is comparable to the spin-wave wavelength. This opens up new, interesting routes of spin-wave amplification and manipulation. In this review, we will give an overview over the recent developments and achievements in this field.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-08-09
... Mississippi Department of Environmental Quality (MDEQ), on July 13, 2012, for parallel processing. This... of Contents I. What is parallel processing? II. Background III. What elements are required under... Executive Order Reviews I. What is parallel processing? Consistent with EPA regulations found at 40 CFR Part...
Double Take: Parallel Processing by the Cerebral Hemispheres Reduces Attentional Blink
ERIC Educational Resources Information Center
Scalf, Paige E.; Banich, Marie T.; Kramer, Arthur F.; Narechania, Kunjan; Simon, Clarissa D.
2007-01-01
Recent data have shown that parallel processing by the cerebral hemispheres can expand the capacity of visual working memory for spatial locations (J. F. Delvenne, 2005) and attentional tracking (G. A. Alvarez & P. Cavanagh, 2005). Evidence that parallel processing by the cerebral hemispheres can improve item identification has remained elusive.…
Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin
2013-01-01
One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803
Paucke, Madlen; Oppermann, Frank; Koch, Iring; Jescheniak, Jörg D
2015-12-01
Previous dual-task picture-naming studies suggest that lexical processes require capacity-limited processes and prevent other tasks to be carried out in parallel. However, studies involving the processing of multiple pictures suggest that parallel lexical processing is possible. The present study investigated the specific costs that may arise when such parallel processing occurs. We used a novel dual-task paradigm by presenting 2 visual objects associated with different tasks and manipulating between-task similarity. With high similarity, a picture-naming task (T1) was combined with a phoneme-decision task (T2), so that lexical processes were shared across tasks. With low similarity, picture-naming was combined with a size-decision T2 (nonshared lexical processes). In Experiment 1, we found that a manipulation of lexical processes (lexical frequency of T1 object name) showed an additive propagation with low between-task similarity and an overadditive propagation with high between-task similarity. Experiment 2 replicated this differential forward propagation of the lexical effect and showed that it disappeared with longer stimulus onset asynchronies. Moreover, both experiments showed backward crosstalk, indexed as worse T1 performance with high between-task similarity compared with low similarity. Together, these findings suggest that conditions of high between-task similarity can lead to parallel lexical processing in both tasks, which, however, does not result in benefits but rather in extra performance costs. These costs can be attributed to crosstalk based on the dual-task binding problem arising from parallel processing. Hence, the present study reveals that capacity-limited lexical processing can run in parallel across dual tasks but only at the expense of extraordinary high costs. (c) 2015 APA, all rights reserved).
Real-time computing platform for spiking neurons (RT-spike).
Ros, Eduardo; Ortigosa, Eva M; Agís, Rodrigo; Carrillo, Richard; Arnold, Michael
2006-07-01
A computing platform is described for simulating arbitrary networks of spiking neurons in real time. A hybrid computing scheme is adopted that uses both software and hardware components to manage the tradeoff between flexibility and computational power; the neuron model is implemented in hardware and the network model and the learning are implemented in software. The incremental transition of the software components into hardware is supported. We focus on a spike response model (SRM) for a neuron where the synapses are modeled as input-driven conductances. The temporal dynamics of the synaptic integration process are modeled with a synaptic time constant that results in a gradual injection of charge. This type of model is computationally expensive and is not easily amenable to existing software-based event-driven approaches. As an alternative we have designed an efficient time-based computing architecture in hardware, where the different stages of the neuron model are processed in parallel. Further improvements occur by computing multiple neurons in parallel using multiple processing units. This design is tested using reconfigurable hardware and its scalability and performance evaluated. Our overall goal is to investigate biologically realistic models for the real-time control of robots operating within closed action-perception loops, and so we evaluate the performance of the system on simulating a model of the cerebellum where the emulation of the temporal dynamics of the synaptic integration process is important.
Introducing concurrency in the Gaudi data processing framework
NASA Astrophysics Data System (ADS)
Clemencic, Marco; Hegner, Benedikt; Mato, Pere; Piparo, Danilo
2014-06-01
In the past, the increasing demands for HEP processing resources could be fulfilled by the ever increasing clock-frequencies and by distributing the work to more and more physical machines. Limitations in power consumption of both CPUs and entire data centres are bringing an end to this era of easy scalability. To get the most CPU performance per watt, future hardware will be characterised by less and less memory per processor, as well as thinner, more specialized and more numerous cores per die, and rather heterogeneous resources. To fully exploit the potential of the many cores, HEP data processing frameworks need to allow for parallel execution of reconstruction or simulation algorithms on several events simultaneously. We describe our experience in introducing concurrency related capabilities into Gaudi, a generic data processing software framework, which is currently being used by several HEP experiments, including the ATLAS and LHCb experiments at the LHC. After a description of the concurrent framework and the most relevant design choices driving its development, we describe the behaviour of the framework in a more realistic environment, using a subset of the real LHCb reconstruction workflow, and present our strategy and the used tools to validate the physics outcome of the parallel framework against the results of the present, purely sequential LHCb software. We then summarize the measurement of the code performance of the multithreaded application in terms of memory and CPU usage.
Graphical Representation of Parallel Algorithmic Processes
1990-12-01
interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor
NASA Astrophysics Data System (ADS)
Bogdanov, Valery L.; Boyce-Jacino, Michael
1999-05-01
Confined arrays of biochemical probes deposited on a solid support surface (analytical microarray or 'chip') provide an opportunity to analysis multiple reactions simultaneously. Microarrays are increasingly used in genetics, medicine and environment scanning as research and analytical instruments. A power of microarray technology comes from its parallelism which grows with array miniaturization, minimization of reagent volume per reaction site and reaction multiplexing. An optical detector of microarray signals should combine high sensitivity, spatial and spectral resolution. Additionally, low-cost and a high processing rate are needed to transfer microarray technology into biomedical practice. We designed an imager that provides confocal and complete spectrum detection of entire fluorescently-labeled microarray in parallel. Imager uses microlens array, non-slit spectral decomposer, and high- sensitive detector (cooled CCD). Two imaging channels provide a simultaneous detection of localization, integrated and spectral intensities for each reaction site in microarray. A dimensional matching between microarray and imager's optics eliminates all in moving parts in instrumentation, enabling highly informative, fast and low-cost microarray detection. We report theory of confocal hyperspectral imaging with microlenses array and experimental data for implementation of developed imager to detect fluorescently labeled microarray with a density approximately 103 sites per cm2.
Fully Parallel MHD Stability Analysis Tool
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2014-10-01
Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Cascade model of gamma-ray bursts: Power-law and annihilation-line components
NASA Technical Reports Server (NTRS)
Harding, A. K.; Sturrock, P. A.; Daugherty, J. K.
1988-01-01
If, in a neutron star magnetosphere, an electron is accelerated to an energy of 10 to the 11th or 12th power eV by an electric field parallel to the magnetic field, motion of the electron along the curved field line leads to a cascade of gamma rays and electron-positron pairs. This process is believed to occur in radio pulsars and gamma ray burst sources. Results are presented from numerical simulations of the radiation and photon annihilation pair production processes, using a computer code previously developed for the study of radio pulsars. A range of values of initial energy of a primary electron was considered along with initial injection position, and magnetic dipole moment of the neutron star. The resulting spectra was found to exhibit complex forms that are typically power law over a substantial range of photon energy, and typically include a dip in the spectrum near the electron gyro-frequency at the injection point. The results of a number of models are compared with data for the 5 Mar., 1979 gamma ray burst. A good fit was found to the gamma ray part of the spectrum, including the equivalent width of the annihilation line.
46 CFR 112.05-3 - Main-emergency bus-tie.
Code of Federal Regulations, 2010 CFR
2010-10-01
... emergency switchboard; (b) Be arranged to prevent parallel operation of an emergency power source with any other source of electric power, except for interlock systems for momentary transfer of loads; and (c) If arranged for feedback operation, open automatically upon overload of the emergency power source before the...
Dc-To-Dc Converter Uses Reverse Conduction Of MOSFET's
NASA Technical Reports Server (NTRS)
Gruber, Robert P.; Gott, Robert W.
1991-01-01
In modified high-power, phase-controlled, full-bridge, pulse-width-modulated dc-to-dc converters, switching devices power metal oxide/semiconductor field-effect transistors (MOSFET's). Decreases dissipation of power during switching by eliminating approximately 0.7-V forward voltage drop in anti-parallel diodes. Energy-conversion efficiency increased.
Control and protection system for paralleled modular static inverter-converter systems
NASA Technical Reports Server (NTRS)
Birchenough, A. G.; Gourash, F.
1973-01-01
A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.
Leung, Chung Ming; Wang, Ya; Chen, Wusi
2016-11-01
In this letter, the airfoil-based electromagnetic energy harvester containing parallel array motion between moving coil and trajectory matching multi-pole magnets was investigated. The magnets were aligned in an alternatively magnetized formation of 6 magnets to explore enhanced power density. In particular, the magnet array was positioned in parallel to the trajectory of the tip coil within its tip deflection span. The finite element simulations of the magnetic flux density and induced voltages at an open circuit condition were studied to find the maximum number of alternatively magnetized magnets that was required for the proposed energy harvester. Experimental results showed that the energy harvester with a pair of 6 alternatively magnetized linear magnet arrays was able to generate an induced voltage (V o ) of 20 V, with an open circuit condition, and 475 mW, under a 30 Ω optimal resistance load operating with the wind speed (U) at 7 m/s and a natural bending frequency of 3.54 Hz. Compared to the traditional electromagnetic energy harvester with a single magnet moving through a coil, the proposed energy harvester, containing multi-pole magnets and parallel array motion, enables the moving coil to accumulate a stronger magnetic flux in each period of the swinging motion. In addition to the comparison made with the airfoil-based piezoelectric energy harvester of the same size, our proposed electromagnetic energy harvester generates 11 times more power output, which is more suitable for high-power-density energy harvesting applications at regions with low environmental frequency.
Parallel Optical Random Access Memory (PORAM)
NASA Technical Reports Server (NTRS)
Alphonse, G. A.
1989-01-01
It is shown that the need to minimize component count, power and size, and to maximize packing density require a parallel optical random access memory to be designed in a two-level hierarchy: a modular level and an interconnect level. Three module designs are proposed, in the order of research and development requirements. The first uses state-of-the-art components, including individually addressed laser diode arrays, acousto-optic (AO) deflectors and magneto-optic (MO) storage medium, aimed at moderate size, moderate power, and high packing density. The next design level uses an electron-trapping (ET) medium to reduce optical power requirements. The third design uses a beam-steering grating surface emitter (GSE) array to reduce size further and minimize the number of components.
Li, Xinying; Yu, Jianjun; Dong, Ze; Zhang, Junwen; Chi, Nan; Yu, Jianguo
2013-03-01
We experimentally investigate the interference in multiple-input multiple-output (MIMO) wireless transmission by adjusting the relative locations of horn antennas (HAs) in a 100 GHz optical wireless integration system, which can deliver a 50 Gb/s polarization-division-multiplexing quadrature-phase-shift-keying signal over 80 km single-mode fiber-28 and a 2×2 MIMO wireless link. For the parallel 2×2 MIMO wireless link, each receiver HA can only get wireless power from the corresponding transmitter HA, while for the crossover ones, the receiver HA can get wireless power from two transmitter HAs. At the wireless receiver, polarization demultiplexing is realized by the constant modulus algorithm (CMA) in the digital-signal-processing part. Compared to the parallel case, wireless interference causes about 2 dB optical signal-to-noise ratio penalty at a bit-error ratio (BER) of 3.8×10(-3) for the crossover cases if similar CMA taps are employed. The increase in CMA tap length can reduce wireless interference and improve BER performance. Furthermore, more CMA taps should be adopted to overcome the severe wireless interference when two pairs of transmitter and receiver HAs have different wireless distances.
Impact of parallel trade on pharmaceutical firm's profits: rise or fall?
Guo, Shen; Hu, Bin; Zhong, Hai
2013-04-01
Most existing studies on parallel trade conclude that it reduces pharmaceutical firms' profits. One special feature of the pharmaceutical industry is the presence of price regulation in most countries. Taking into account the impact of parallel trade on the regulated pharmaceutical prices [Pecorino, P.: J. Health Econ. 21, 699-708 (2002)] shows that a pharmaceutical firm's profit is greater in the presence of parallel trade. The present paper relaxes the assumption on identical demands among countries, and takes into account transaction costs. The results of our model show that a firm's profits may increase or decrease in the presence of parallel trade, depending on its bargaining power in the price negotiation and market size of the drug. Changes in social welfare due to the transition to parallel trade regime are also considered.
Xu, Liangliang; Xu, Nengxiong
2017-01-01
This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available. PMID:28989754
Mei, Gang; Xu, Liangliang; Xu, Nengxiong
2017-09-01
This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points' spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.
Model-Based Systems Engineering in the Execution of Search and Rescue Operations
2015-09-01
OSC can fulfill the duties of an ACO but it may make sense to split the duties if there are no communication links between the OSC and participating...parallel mode. This mode is the most powerful option because it 35 creates sequence diagrams that generate parallel “ swim lanes” for each asset...greater flexibility is desired, sequence mode generates diagrams based purely on sequential action and activity diagrams without the parallel “ swim lanes
Fast parallel algorithm for slicing STL based on pipeline
NASA Astrophysics Data System (ADS)
Ma, Xulong; Lin, Feng; Yao, Bo
2016-05-01
In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
The characteristics and limitations of the MPS/MMS battery charging system
NASA Technical Reports Server (NTRS)
Ford, F. E.; Palandati, C. F.; Davis, J. F.; Tasevoli, C. M.
1980-01-01
A series of tests was conducted on two 12 ampere hour nickel cadmium batteries under a simulated cycle regime using the multiple voltage versus temperature levels designed into the modular power system (MPS). These tests included: battery recharge as a function of voltage control level; temperature imbalance between two parallel batteries; a shorted or partially shorted cell in one of the two parallel batteries; impedance imbalance of one of the parallel battery circuits; and disabling and enabling one of the batteries from the bus at various charge and discharge states. The results demonstrate that the eight commandable voltage versus temperature levels designed into the MPS provide a very flexible system that not only can accommodate a wide range of normal power system operation, but also provides a high degree of flexibility in responding to abnormal operating conditions.
Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.
Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry
2016-11-01
Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Distributed computing feasibility in a non-dedicated homogeneous distributed system
NASA Technical Reports Server (NTRS)
Leutenegger, Scott T.; Sun, Xian-He
1993-01-01
The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Vibration energy harvesting using a piezoelectric circular diaphragm array.
Wang, Wei; Yang, Tongqing; Chen, Xurui; Yao, Xi
2012-09-01
This paper presents a method for harvesting electric energy from mechanical vibration using a mechanically excited piezoelectric circular membrane array. The piezoelectric circular diaphragm array consists of four plates with series and parallel connection, and the electrical characteristics of the array are examined under dynamic conditions. With an optimal load resistor of 160 kΩ, an output power of 28 mW was generated from the array in series connection at 150 Hz under a prestress of 0.8 N and a vibration acceleration of 9.8 m/s(2), whereas a maximal output power of 27 mW can be obtained from the array in parallel connection through a resistive load of 11 kΩ under the same frequency, prestress, and acceleration conditions. The results show that using a piezoelectric circular diaphragm array can significantly increase the output of energy compared with the use of a single plate. By choosing an appropriate connection pattern (series or parallel connections) among the plates, the equivalent impedance of the energy harvesting devices can be tailored to meet the matched load of different applications for maximal power output.
Toward a Model Framework of Generalized Parallel Componential Processing of Multi-Symbol Numbers
ERIC Educational Resources Information Center
Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph
2015-01-01
In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining…
NASA Astrophysics Data System (ADS)
Homburg, Oliver; Jarczynski, Manfred; Mitra, Thomas; Brüning, Stephan
2017-02-01
In the last decade much improvement has been achieved for ultra-short pulse lasers with high repetition rates. This laser technology has vastly matured so that it entered a manifold of industrial applications recently compared to mainly scientific use in the past. Compared to ns-pulse ablation ultra-short pulses in the ps- or even fs regime lead to still colder ablation and further reduced heat-affected zones. This is crucial for micro patterning when structure sizes are getting smaller and requirements are getting stronger at the same time. An additional advantage of ultra-fast processing is its applicability to a large variety of materials, e.g. metals and several high bandgap materials like glass and ceramics. One challenge for ultra-fast micro machining is throughput. The operational capacity of these processes can be maximized by increasing the scan rate or the number of beams - parallel processing. This contribution focuses on process parallelism of ultra-short pulsed lasers with high repetition rate and individually addressable acousto-optical beam modulation. The core of the multi-beam generation is a smooth diffractive beam splitter component with high uniform spots and negligible loss, and a prismatic array compressor to match beam size and pitch. The optical design and the practical realization of an 8 beam processing head in combination with a high average power single mode ultra-short pulsed laser source are presented as well as the currently on-going and promising laboratory research and micro machining results. Finally, an outlook of scaling the processing head to several tens of beams is given.
Triple voltage dc-to-dc converter and method
Su, Gui-Jia
2008-08-05
A circuit and method of providing three dc voltage buses and transforming power between a low voltage dc converter and a high voltage dc converter, by coupling a primary dc power circuit and a secondary dc power circuit through an isolation transformer; providing the gating signals to power semiconductor switches in the primary and secondary circuits to control power flow between the primary and secondary circuits and by controlling a phase shift between the primary voltage and the secondary voltage. The primary dc power circuit and the secondary dc power circuit each further comprising at least two tank capacitances arranged in series as a tank leg, at least two resonant switching devices arranged in series with each other and arranged in parallel with the tank leg, and at least one voltage source arranged in parallel with the tank leg and the resonant switching devices, said resonant switching devices including power semiconductor switches that are operated by gating signals. Additional embodiments having a center-tapped battery on the low voltage side and a plurality of modules on both the low voltage side and the high voltage side are also disclosed for the purpose of reducing ripple current and for reducing the size of the components.
Future in biomolecular computation
NASA Astrophysics Data System (ADS)
Wimmer, E.
1988-01-01
Large-scale computations for biomolecules are dominated by three levels of theory: rigorous quantum mechanical calculations for molecules with up to about 30 atoms, semi-empirical quantum mechanical calculations for systems with up to several hundred atoms, and force-field molecular dynamics studies of biomacromolecules with 10,000 atoms and more including surrounding solvent molecules. It can be anticipated that increased computational power will allow the treatment of larger systems of ever growing complexity. Due to the scaling of the computational requirements with increasing number of atoms, the force-field approaches will benefit the most from increased computational power. On the other hand, progress in methodologies such as density functional theory will enable us to treat larger systems on a fully quantum mechanical level and a combination of molecular dynamics and quantum mechanics can be envisioned. One of the greatest challenges in biomolecular computation is the protein folding problem. It is unclear at this point, if an approach with current methodologies will lead to a satisfactory answer or if unconventional, new approaches will be necessary. In any event, due to the complexity of biomolecular systems, a hierarchy of approaches will have to be established and used in order to capture the wide ranges of length-scales and time-scales involved in biological processes. In terms of hardware development, speed and power of computers will increase while the price/performance ratio will become more and more favorable. Parallelism can be anticipated to become an integral architectural feature in a range of computers. It is unclear at this point, how fast massively parallel systems will become easy enough to use so that new methodological developments can be pursued on such computers. Current trends show that distributed processing such as the combination of convenient graphics workstations and powerful general-purpose supercomputers will lead to a new style of computing in which the calculations are monitored and manipulated as they proceed. The combination of a numeric approach with artificial-intelligence approaches can be expected to open up entirely new possibilities. Ultimately, the most exciding aspect of the future in biomolecular computing will be the unexpected discoveries.
Ion Heating and Flows in a High Power Helicon Source
NASA Astrophysics Data System (ADS)
Scime, Earl; Agnello, Riccardo; Furno, Ivo; Howling, Alan; Jacquier, Remy; Plyushchev, Gennady; Thompson, Derek
2017-10-01
We report experimental measurements of ion temperatures and flows in a high power, linear, magnetized, helicon plasma device, the Resonant Antenna Ion Device (RAID). RAID is equipped with a high power helicon source. Parallel and perpendicular ion temperatures on the order of 0.6 eV are observed for an rf power of 4 kW, suggesting that higher power helicon sources should attain ion temperatures in excess of 1 eV. The unique RAID antenna design produces broad, uniform plasma density and perpendicular ion temperature radial profiles. Measurements of the azimuthal flow indicate rigid body rotation of the plasma column of a few kHz. When configured with an expanding magnetic field, modest parallel ion flows are observed in the expansion region. The ion flows and temperatures are derived from laser induced fluorescence measurements of the Doppler resolved velocity distribution functions of argon ions. This work supported by U.S. National Science Foundation Grant No. PHY-1360278.
Wijmans, Johannes G [Menlo Park, CA; Merkel, Timothy C [Menlo Park, CA; Baker, Richard W [Palo Alto, CA
2011-10-11
Disclosed herein are combustion systems and power plants that incorporate sweep-based membrane separation units to remove carbon dioxide from combustion gases. In its most basic embodiment, the invention is a combustion system that includes three discrete units: a combustion unit, a carbon dioxide capture unit, and a sweep-based membrane separation unit. In a preferred embodiment, the invention is a power plant including a combustion unit, a power generation system, a carbon dioxide capture unit, and a sweep-based membrane separation unit. In both of these embodiments, the carbon dioxide capture unit and the sweep-based membrane separation unit are configured to be operated in parallel, by which we mean that each unit is adapted to receive exhaust gases from the combustion unit without such gases first passing through the other unit.
A Triple-Loop Inductive Power Transmission System for Biomedical Applications.
Lee, Byunghun; Kiani, Mehdi; Ghovanloo, Maysam
2016-02-01
A triple-loop wireless power transmission (WPT) system equipped with closed-loop global power control, adaptive transmitter (Tx) resonance compensation (TRC), and automatic receiver (Rx) resonance tuning (ART) is presented. This system not only opposes coupling and load variations but also compensates for changes in the environment surrounding the inductive link to enhance power transfer efficiency (PTE) in applications such as implantable medical devices (IMDs). The Tx was built around a commercial off-the-shelf (COTS) radio-frequency identification (RFID) reader, operating at 13.56 MHz. A local Tx loop finds the optimal capacitance in parallel with the Tx coil by adjusting a varactor. A global power control loop maintains the received power at a desired level in the presence of changes in coupling distance, coil misalignments, and loading. Moreover, a local Rx loop is implemented inside a power management integrated circuit (PMIC) to avoid PTE degradation due to the Rx coil surrounding environment and process variations. The PMIC was fabricated in a 0.35- μm 4M2P standard CMOS process with 2.54 mm(2) active area. Measurement results show that the proposed triple-loop system improves the overall PTE by up to 10.5% and 4.7% compared to a similar open- and single closed-loop system, respectively, at nominal coil distance of 2 cm. The added TRC and ART loops contribute 2.3% and 1.4% to the overall PTE of 13.5%, respectively. This is the first WPT system to include three loops to dynamically compensate for environment and circuit variations and improve the overall power efficiency all the way from the driver output in Tx to the load in Rx.
Fernández-Berni, Jorge; Carmona-Galán, Ricardo; del Río, Rocío; Kleihorst, Richard; Philips, Wilfried; Rodríguez-Vázquez, Ángel
2014-01-01
The capture, processing and distribution of visual information is one of the major challenges for the paradigm of the Internet of Things. Privacy emerges as a fundamental barrier to overcome. The idea of networked image sensors pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. Power consumption also constitutes a crucial aspect. Images contain a massive amount of data to be processed under strict timing requirements, demanding high-performance vision systems. In this paper, we describe a hardware-based strategy to concurrently address these two key issues. By conveying processing capabilities to the focal plane in addition to sensing, we can implement privacy protection measures just at the point where sensitive data are generated. Furthermore, such measures can be tailored for efficiently reducing the computational load of subsequent processing stages. As a proof of concept, a full-custom QVGA vision sensor chip is presented. It incorporates a mixed-signal focal-plane sensing-processing array providing programmable pixelation of multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise dynamic range adaptation, integral image computation and multi-resolution filtering. The proposed circuitry is also suitable to build a granular space, becoming the raw material for subsequent feature extraction and recognition of categorized objects. PMID:25195849
Fernández-Berni, Jorge; Carmona-Galán, Ricardo; del Río, Rocío; Kleihorst, Richard; Philips, Wilfried; Rodríguez-Vázquez, Ángel
2014-08-19
The capture, processing and distribution of visual information is one of the major challenges for the paradigm of the Internet of Things. Privacy emerges as a fundamental barrier to overcome. The idea of networked image sensors pervasively collecting data generates social rejection in the face of sensitive information being tampered by hackers or misused by legitimate users. Power consumption also constitutes a crucial aspect. Images contain a massive amount of data to be processed under strict timing requirements, demanding high-performance vision systems. In this paper, we describe a hardware-based strategy to concurrently address these two key issues. By conveying processing capabilities to the focal plane in addition to sensing, we can implement privacy protection measures just at the point where sensitive data are generated. Furthermore, such measures can be tailored for efficiently reducing the computational load of subsequent processing stages. As a proof of concept, a full-custom QVGA vision sensor chip is presented. It incorporates a mixed-signal focal-plane sensing-processing array providing programmable pixelation of multiple image regions in parallel. In addition to this functionality, the sensor exploits reconfigurability to implement other processing primitives, namely block-wise dynamic range adaptation, integral image computation and multi-resolution filtering. The proposed circuitry is also suitable to build a granular space, becoming the raw material for subsequent feature extraction and recognition of categorized objects.
A 0.13-µm implementation of 5 Gb/s and 3-mW folded parallel architecture for AES algorithm
NASA Astrophysics Data System (ADS)
Rahimunnisa, K.; Karthigaikumar, P.; Kirubavathy, J.; Jayakumar, J.; Kumar, S. Suresh
2014-02-01
A new architecture for encrypting and decrypting the confidential data using Advanced Encryption Standard algorithm is presented in this article. This structure combines the folded structure with parallel architecture to increase the throughput. The whole architecture achieved high throughput with less power. The proposed architecture is implemented in 0.13-µm Complementary metal-oxide-semiconductor (CMOS) technology. The proposed structure is compared with different existing structures, and from the result it is proved that the proposed structure gives higher throughput and less power compared to existing works.
NASA Technical Reports Server (NTRS)
Wilson, T. G.; Lee, F. C. Y.; Burns, W. W., III; Owen, H. A., Jr.
1974-01-01
A procedure is developed for classifying dc-to-square-wave two-transistor parallel inverters used in power conditioning applications. The inverters are reduced to equivalent RLC networks and are then grouped with other inverters with the same basic equivalent circuit. Distinction between inverter classes is based on the topology characteristics of the equivalent circuits. Information about one class can then be extended to another class using the basic oscillation theory and the concept of duality. Oscillograms from test circuits confirm the validity of the procedure adopted.
Molecular solid-state inverter-converter system
NASA Technical Reports Server (NTRS)
Birchenough, A. G.
1973-01-01
A modular approach for aerospace electrical systems has been developed, using lightweight high efficiency pulse width modulation techniques. With the modular approach, a required system is obtained by paralleling modules. The modular system includes the inverters and converters, a paralleling system, and an automatic control and fault-sensing protection system with a visual annunciator. The output is 150 V dc, or a low distortion three phase sine wave at 120 V, 400 Hz. Input power is unregulated 56 V dc. Each module is rated 2.5 kW or 3.6 kVA at 0.7 power factor.