soft-core processor study: Topics by Science.gov

Sample records for soft-core processor study

Soft-core processor study for node-based architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Houten, Jonathan Roger; Jarosz, Jason P.; Welch, Benjamin James

2008-09-01

Node-based architecture (NBA) designs for future satellite projects hold the promise of decreasing system development time and costs, size, weight, and power and positioning the laboratory to address other emerging mission opportunities quickly. Reconfigurable Field Programmable Gate Array (FPGA) based modules will comprise the core of several of the NBA nodes. Microprocessing capabilities will be necessary with varying degrees of mission-specific performance requirements on these nodes. To enable the flexibility of these reconfigurable nodes, it is advantageous to incorporate the microprocessor into the FPGA itself, either as a hardcore processor built into the FPGA or as a soft-core processor builtmore » out of FPGA elements. This document describes the evaluation of three reconfigurable FPGA based processors for use in future NBA systems--two soft cores (MicroBlaze and non-fault-tolerant LEON) and one hard core (PowerPC 405). Two standard performance benchmark applications were developed for each processor. The first, Dhrystone, is a fixed-point operation metric. The second, Whetstone, is a floating-point operation metric. Several trials were run at varying code locations, loop counts, processor speeds, and cache configurations. FPGA resource utilization was recorded for each configuration. Cache configurations impacted the results greatly; for optimal processor efficiency it is necessary to enable caches on the processors. Processor caches carry a penalty; cache error mitigation is necessary when operating in a radiation environment.« less
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor

PubMed Central

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-01-01

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation. PMID:27983714
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.

PubMed

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-12-15

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
PDSparc: A Drop-In Replacement for LEON3 Written Using Synopsys Processor Designer

DTIC Science & Technology

2015-09-24

Kate Thurmer MIT Lincoln Laboratory, Lexington, MA, USA Distribution A: Public Release ABSTRACT Microprocessors are the...enabled appliances has opened a significant new niche: the Application Specific Standard Product (ASSP) microprocessor . These processors usually start...out as soft-cores that are parameterized at design time to realize exclusively the specific needs of the application. The microprocessor is a small
Multi-Core Processor Memory Contention Benchmark Analysis Case Study

NASA Technical Reports Server (NTRS)

Simon, Tyler; McGalliard, James

2009-01-01

Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Efficiency of static core turn-off in a system-on-a-chip with variation

DOEpatents

Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong

2013-10-29

A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.
Performance and advantages of a soft-core based parallel architecture for energy peak detection in the calorimeter Level 0 trigger for the NA62 experiment at CERN

NASA Astrophysics Data System (ADS)

Ammendola, R.; Barbanera, M.; Bizzarri, M.; Bonaiuto, V.; Ceccucci, A.; Checcucci, B.; De Simone, N.; Fantechi, R.; Federici, L.; Fucci, A.; Lupi, M.; Paoluzzi, G.; Papi, A.; Piccini, M.; Ryjov, V.; Salamon, A.; Salina, G.; Sargeni, F.; Venditti, S.

2017-03-01

The NA62 experiment at CERN SPS has started its data-taking. Its aim is to measure the branching ratio of the ultra-rare decay K+ → π+ν ν̅ . In this context, rejecting the background is a crucial topic. One of the main background to the measurement is represented by the K+ → π+π0 decay. In the 1-8.5 mrad decay region this background is rejected by the calorimetric trigger processor (Cal-L0). In this work we present the performance of a soft-core based parallel architecture built on FPGAs for the energy peak reconstruction as an alternative to an implementation completely founded on VHDL language.
Real-time machine vision system using FPGA and soft-core processor

NASA Astrophysics Data System (ADS)

Malik, Abdul Waheed; Thörnberg, Benny; Meng, Xiaozhou; Imran, Muhammad

2012-06-01

This paper presents a machine vision system for real-time computation of distance and angle of a camera from reference points in the environment. Image pre-processing, component labeling and feature extraction modules were modeled at Register Transfer (RT) level and synthesized for implementation on field programmable gate arrays (FPGA). The extracted image component features were sent from the hardware modules to a soft-core processor, MicroBlaze, for computation of distance and angle. A CMOS imaging sensor operating at a clock frequency of 27MHz was used in our experiments to produce a video stream at the rate of 75 frames per second. Image component labeling and feature extraction modules were running in parallel having a total latency of 13ms. The MicroBlaze was interfaced with the component labeling and feature extraction modules through Fast Simplex Link (FSL). The latency for computing distance and angle of camera from the reference points was measured to be 2ms on the MicroBlaze, running at 100 MHz clock frequency. In this paper, we present the performance analysis, device utilization and power consumption for the designed system. The FPGA based machine vision system that we propose has high frame speed, low latency and a power consumption that is much lower compared to commercially available smart camera solutions.
Ordering of guarded and unguarded stores for no-sync I/O

DOEpatents

Gara, Alan; Ohmacht, Martin

2013-06-25

A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue.
Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

NASA Astrophysics Data System (ADS)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Implementation of kernels on the Maestro processor

NASA Astrophysics Data System (ADS)

Suh, Jinwoo; Kang, D. I. D.; Crago, S. P.

Currently, most microprocessors use multiple cores to increase performance while limiting power usage. Some processors use not just a few cores, but tens of cores or even 100 cores. One such many-core microprocessor is the Maestro processor, which is based on Tilera's TILE64 processor. The Maestro chip is a 49-core, general-purpose, radiation-hardened processor designed for space applications. The Maestro processor, unlike the TILE64, has a floating point unit (FPU) in each core for improved floating point performance. The Maestro processor runs at 342 MHz clock frequency. On the Maestro processor, we implemented several widely used kernels: matrix multiplication, vector add, FIR filter, and FFT. We measured and analyzed the performance of these kernels. The achieved performance was up to 5.7 GFLOPS, and the speedup compared to single tile was up to 49 using 49 tiles.
A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

NASA Astrophysics Data System (ADS)

Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

2010-07-01

This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
Testing and operating a multiprocessor chip with processor redundancy

DOEpatents

Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J

2014-10-21

A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
Multiple core computer processor with globally-accessible local memories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shalf, John; Donofrio, David; Oliker, Leonid

A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality ofmore » processor cores.« less
Research based on the SoPC platform of feature-based image registration

NASA Astrophysics Data System (ADS)

Shi, Yue-dong; Wang, Zhi-hui

2015-12-01

This paper focuses on the study of implementing feature-based image registration by System on a Programmable Chip (SoPC) hardware platform. We solidify the image registration algorithm on the FPGA chip, in which embedded soft core processor Nios II can speed up the image processing system. In this way, we can make image registration technology get rid of the PC. And, consequently, this kind of technology will be got an extensive use. The experiment result indicates that our system shows stable performance, particularly in terms of matching processing which noise immunity is good. And feature points of images show a reasonable distribution.
Powerful conveyer belt real-time online detection system based on x-ray

NASA Astrophysics Data System (ADS)

Rong, Feng; Miao, Chang-yun; Meng, Wei

2009-07-01

The powerful conveyer belt is widely used in the mine, dock, and so on. After used for a long time, internal steel rope of the conveyor belt may fracture, rust, joints moving, and so on .This would bring potential safety problems. A kind of detection system based on x-ray is designed in this paper. Linear array detector (LDA) is used. LDA cost is low, response fast; technology mature .Output charge of LDA is transformed into differential voltage signal by amplifier. This kind of signal have great ability of anti-noise, is suitable for long-distance transmission. The processor is FPGA. A IP core control 4-channel A/D convertor, achieve parallel output data collection. Soft-core processor MicroBlaze which process tcp/ip protocol is embedded in FPGA. Sampling data are transferred to a computer via Ethernet. In order to improve the image quality, algorithm of getting rid of noise from the measurement result and taking gain normalization for pixel value is studied and designed. Experiments show that this system work well, can real-time online detect conveyor belt of width of 2.0m and speed of 5 m/s, does not affect the production. Image is clear, visual and can easily judge the situation of conveyor belt.
Soft electron processor for surface sterilization of food material

NASA Astrophysics Data System (ADS)

Baba, Takashi; Kaneko, Hiromi; Taniguchi, Shuichi

2004-09-01

As frozen or chilled foods have become popular nowadays, it has become very important to provide raw materials with lower level microbial contamination to food processing companies. Consequently, the sterilization of food material is one of the major topics for food processing. Dried materials like grains, beans and spices, etc., are not typically deeply contaminated by microorganisms, which reside on the surfaces of materials, so it is very useful to take low energetic, lower than 300 keV, electrons with small penetration power (Soft-Electrons), as a sterilization method for such materials. Soft-Electrons is researched and named by Dr. Hayashi et al. This is a non-thermal method, so one can keep foods hygienic without serious deterioration. It is also a physical method, so is free from residues of chemicals in foods. Recently, Nissin-High Voltage Co., Ltd. have developed and manufactured equipment for commercial use of Soft-Electrons (Soft Electron Processor), which can process 500 kg/h of grains. This report introduces the Soft Electron Processor and shows the results of sterilization of wheat and brown rice by the equipment.
Scheduler for multiprocessor system switch with selective pairing

DOEpatents

Gara, Alan; Gschwind, Michael Karl; Salapura, Valentina

2015-01-06

System, method and computer program product for scheduling threads in a multiprocessing system with selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). The method configures the selective pairing facility to use checking provide one highly reliable thread for high-reliability and allocate threads to corresponding processor cores indicating need for hardware checking. The method configures the selective pairing facility to provide multiple independent cores and allocate threads to corresponding processor cores indicating inherent resilience.
Reconfigurable Hardware Adapts to Changing Mission Demands

NASA Technical Reports Server (NTRS)

2003-01-01

A new class of computing architectures and processing systems, which use reconfigurable hardware, is creating a revolutionary approach to implementing future spacecraft systems. With the increasing complexity of electronic components, engineers must design next-generation spacecraft systems with new technologies in both hardware and software. Derivation Systems, Inc., of Carlsbad, California, has been working through NASA s Small Business Innovation Research (SBIR) program to develop key technologies in reconfigurable computing and Intellectual Property (IP) soft cores. Founded in 1993, Derivation Systems has received several SBIR contracts from NASA s Langley Research Center and the U.S. Department of Defense Air Force Research Laboratories in support of its mission to develop hardware and software for high-assurance systems. Through these contracts, Derivation Systems began developing leading-edge technology in formal verification, embedded Java, and reconfigurable computing for its PF3100, Derivational Reasoning System (DRS ), FormalCORE IP, FormalCORE PCI/32, FormalCORE DES, and LavaCORE Configurable Java Processor, which are designed for greater flexibility and security on all space missions.
Cochlear implant characteristics and speech perception skills of adolescents with long-term device use.

PubMed

Davidson, Lisa S; Geers, Ann E; Brenner, Christine

2010-10-01

Updated cochlear implant technology and optimized fitting can have a substantial impact on speech perception. The effects of upgrades in processor technology and aided thresholds on word recognition at soft input levels and sentence recognition in noise were examined. We hypothesized that updated speech processors and lower aided thresholds would allow improved recognition of soft speech without compromising performance in noise. 109 teenagers who had used a Nucleus 22-cochlear implant since preschool were tested with their current speech processor(s) (101 unilateral and 8 bilateral): 13 used the Spectra, 22 the ESPrit 22, 61 the ESPrit 3G, and 13 the Freedom. The Lexical Neighborhood Test (LNT) was administered at 70 and 50 dB SPL and the Bamford Kowal Bench sentences were administered in quiet and in noise. Aided thresholds were obtained for frequency-modulated tones from 250 to 4,000 Hz. Results were analyzed using repeated measures analysis of variance. Aided thresholds for the Freedom/3G group were significantly lower (better) than the Spectra/Sprint group. LNT scores at 50 dB were significantly higher for the Freedom/3G group. No significant differences between the 2 groups were found for the LNT at 70 or sentences in quiet or noise. Adolescents using updated processors that allowed for aided detection thresholds of 30 dB HL or better performed the best at soft levels. The BKB in noise results suggest that greater access to soft speech does not compromise listening in noise.

Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

DTIC Science & Technology

2013-09-30

STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the
FPGA-Based, Self-Checking, Fault-Tolerant Computers

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2004-01-01

A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.
Use of FPGA embedded processors for fast cluster reconstruction in the NA62 liquid krypton electromagnetic calorimeter

NASA Astrophysics Data System (ADS)

Badoni, D.; Bizzarri, M.; Bonaiuto, V.; Checcucci, B.; De Simone, N.; Federici, L.; Fucci, A.; Paoluzzi, G.; Papi, A.; Piccini, M.; Salamon, A.; Salina, G.; Santovetti, E.; Sargeni, F.; Venditti, S.

2014-01-01

The goal of the NA62 experiment at the CERN SPS is the measurement of the Branching Ratio of the very rare kaon decay K+→π+ ν bar nu with a 10% accuracy by collecting 100 events in two years of data taking. An efficient photon veto system is needed to reject the K+→π+ π0 background and a liquid krypton electromagnetic calorimeter will be used for this purpose in the 1-10 mrad angular region. The L0 trigger system for the calorimeter consists of a peak reconstruction algorithm implemented on FPGA by using a mixed parallel architecture based on soft core Altera NIOS II embedded processors together with custom VHDL modules. This solution allows an efficient and flexible reconstruction of the energy-deposition peak. The system will be totally composed of 36 TEL62 boards, 108 mezzanine cards and 215 high-performance FPGAs. We describe the design, current status and the results of the first performance tests.
A customizable system for real-time image processing using the Blackfin DSProcessor and the MicroC/OS-II real-time kernel

NASA Astrophysics Data System (ADS)

Coffey, Stephen; Connell, Joseph

2005-06-01

This paper presents a development platform for real-time image processing based on the ADSP-BF533 Blackfin processor and the MicroC/OS-II real-time operating system (RTOS). MicroC/OS-II is a completely portable, ROMable, pre-emptive, real-time kernel. The Blackfin Digital Signal Processors (DSPs), incorporating the Analog Devices/Intel Micro Signal Architecture (MSA), are a broad family of 16-bit fixed-point products with a dual Multiply Accumulate (MAC) core. In addition, they have a rich instruction set with variable instruction length and both DSP and MCU functionality thus making them ideal for media based applications. Using the MicroC/OS-II for task scheduling and management, the proposed system can capture and process raw RGB data from any standard 8-bit greyscale image sensor in soft real-time and then display the processed result using a simple PC graphical user interface (GUI). Additionally, the GUI allows configuration of the image capture rate and the system and core DSP clock rates thereby allowing connectivity to a selection of image sensors and memory devices. The GUI also allows selection from a set of image processing algorithms based in the embedded operating system.
Parallelization of combinatorial search when solving knapsack optimization problem on computing systems based on multicore processors

NASA Astrophysics Data System (ADS)

Rahman, P. A.

2018-05-01

This scientific paper deals with the model of the knapsack optimization problem and method of its solving based on directed combinatorial search in the boolean space. The offered by the author specialized mathematical model of decomposition of the search-zone to the separate search-spheres and the algorithm of distribution of the search-spheres to the different cores of the multi-core processor are also discussed. The paper also provides an example of decomposition of the search-zone to the several search-spheres and distribution of the search-spheres to the different cores of the quad-core processor. Finally, an offered by the author formula for estimation of the theoretical maximum of the computational acceleration, which can be achieved due to the parallelization of the search-zone to the search-spheres on the unlimited number of the processor cores, is also given.
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks

PubMed Central

Kim, Deokho; Park, Karam; Ro, Won W.

2011-01-01

While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
Developing infrared array controller with software real time operating system

NASA Astrophysics Data System (ADS)

Sako, Shigeyuki; Miyata, Takashi; Nakamura, Tomohiko; Motohara, Kentaro; Uchimoto, Yuka Katsuno; Onaka, Takashi; Kataza, Hirokazu

2008-07-01

Real-time capabilities are required for a controller of a large format array to reduce a dead-time attributed by readout and data transfer. The real-time processing has been achieved by dedicated processors including DSP, CPLD, and FPGA devices. However, the dedicated processors have problems with memory resources, inflexibility, and high cost. Meanwhile, a recent PC has sufficient resources of CPUs and memories to control the infrared array and to process a large amount of frame data in real-time. In this study, we have developed an infrared array controller with a software real-time operating system (RTOS) instead of the dedicated processors. A Linux PC equipped with a RTAI extension and a dual-core CPU is used as a main computer, and one of the CPU cores is allocated to the real-time processing. A digital I/O board with DMA functions is used for an I/O interface. The signal-processing cores are integrated in the OS kernel as a real-time driver module, which is composed of two virtual devices of the clock processor and the frame processor tasks. The array controller with the RTOS realizes complicated operations easily, flexibly, and at a low cost.
An evaluation of MPI message rate on hybrid-core processors

DOE PAGES

Barrett, Brian W.; Brightwell, Ron; Grant, Ryan; ...

2014-11-01

Power and energy concerns are motivating chip manufacturers to consider future hybrid-core processor designs that may combine a small number of traditional cores optimized for single-thread performance with a large number of simpler cores optimized for throughput performance. This trend is likely to impact the way in which compute resources for network protocol processing functions are allocated and managed. In particular, the performance of MPI match processing is critical to achieving high message throughput. In this paper, we analyze the ability of simple and more complex cores to perform MPI matching operations for various scenarios in order to gain insightmore » into how MPI implementations for future hybrid-core processors should be designed.« less
State recovery and lockstep execution restart in a system with multiprocessor pairing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gara, Alan; Gschwind, Michael K; Salapura, Valentina

System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switchmore » or a bus. Each selectively paired processor core is includes a transactional execution facility, whereing the system is configured to enable processor rollback to a previous state and reinitialize lockstep execution in order to recover from an incorrect execution when an incorrect execution has been detected by the selective pairing facility.« less
The Results of a Laboratory Feasibility Study for the Biological Treatment of Umatilla Groundwater

DTIC Science & Technology

2012-01-01

high fructose corn syrup Kroger brand lactose Columbia River Processors, Boardman, OR cheese whey Columbia River Processors, Boardman, OR lactate...Processing Roy Dugan 541·481-3771 79588 Rippee Road 55 High Fructose Corn Syrup Malt Products Corp. Joanne McGuire 530-677-8282 #677 Blackstrap...communication with experts) tested in Run 1 were: • high - fructose corn sugar (based on promising results obtained using soft drink by-products
Managing Power Heterogeneity

NASA Astrophysics Data System (ADS)

Pruhs, Kirk

A particularly important emergent technology is heterogeneous processors (or cores), which many computer architects believe will be the dominant architectural design in the future. The main advantage of a heterogeneous architecture, relative to an architecture of identical processors, is that it allows for the inclusion of processors whose design is specialized for particular types of jobs, and for jobs to be assigned to a processor best suited for that job. Most notably, it is envisioned that these heterogeneous architectures will consist of a small number of high-power high-performance processors for critical jobs, and a larger number of lower-power lower-performance processors for less critical jobs. Naturally, the lower-power processors would be more energy efficient in terms of the computation performed per unit of energy expended, and would generate less heat per unit of computation. For a given area and power budget, heterogeneous designs can give significantly better performance for standard workloads. Moreover, even processors that were designed to be homogeneous, are increasingly likely to be heterogeneous at run time: the dominant underlying cause is the increasing variability in the fabrication process as the feature size is scaled down (although run time faults will also play a role). Since manufacturing yields would be unacceptably low if every processor/core was required to be perfect, and since there would be significant performance loss from derating the entire chip to the functioning of the least functional processor (which is what would be required in order to attain processor homogeneity), some processor heterogeneity seems inevitable in chips with many processors/cores.
CoNNeCT Baseband Processor Module Boot Code SoftWare (BCSW)

NASA Technical Reports Server (NTRS)

Yamamoto, Clifford K.; Orozco, David S.; Byrne, D. J.; Allen, Steven J.; Sahasrabudhe, Adit; Lang, Minh

2012-01-01

This software provides essential startup and initialization routines for the CoNNeCT baseband processor module (BPM) hardware upon power-up. A command and data handling (C&DH) interface is provided via 1553 and diagnostic serial interfaces to invoke operational, reconfiguration, and test commands within the code. The BCSW has features unique to the hardware it is responsible for managing. In this case, the CoNNeCT BPM is configured with an updated CPU (Atmel AT697 SPARC processor) and a unique set of memory and I/O peripherals that require customized software to operate. These features include configuration of new AT697 registers, interfacing to a new HouseKeeper with a flash controller interface, a new dual Xilinx configuration/scrub interface, and an updated 1553 remote terminal (RT) core. The BCSW is intended to provide a "safe" mode for the BPM when initially powered on or when an unexpected trap occurs, causing the processor to reset. The BCSW allows the 1553 bus controller in the spacecraft or payload controller to operate the BPM over 1553 to upload code; upload Xilinx bit files; perform rudimentary tests; read, write, and copy the non-volatile flash memory; and configure the Xilinx interface. Commands also exist over 1553 to cause the CPU to jump or call a specified address to begin execution of user-supplied code. This may be in the form of a real-time operating system, test routine, or specific application code to run on the BPM.
Novel processor architecture for onboard infrared sensors

NASA Astrophysics Data System (ADS)

Hihara, Hiroki; Iwasaki, Akira; Tamagawa, Nobuo; Kuribayashi, Mitsunobu; Hashimoto, Masanori; Mitsuyama, Yukio; Ochi, Hiroyuki; Onodera, Hidetoshi; Kanbara, Hiroyuki; Wakabayashi, Kazutoshi; Tada, Munehiro

2016-09-01

Infrared sensor system is a major concern for inter-planetary missions that investigate the nature and the formation processes of planets and asteroids. The infrared sensor system requires signal preprocessing functions that compensate for the intensity of infrared image sensors to get high quality data and high compression ratio through the limited capacity of transmission channels towards ground stations. For those implementations, combinations of Field Programmable Gate Arrays (FPGAs) and microprocessors are employed by AKATSUKI, the Venus Climate Orbiter, and HAYABUSA2, the asteroid probe. On the other hand, much smaller size and lower power consumption are demanded for future missions to accommodate more sensors. To fulfill this future demand, we developed a novel processor architecture which consists of reconfigurable cluster cores and programmable-logic cells with complementary atom switches. The complementary atom switches enable hardware programming without configuration memories, and thus soft-error on logic circuit connection is completely eliminated. This is a noteworthy advantage for space applications which cannot be found in conventional re-writable FPGAs. Almost one-tenth of lower power consumption is expected compared to conventional re-writable FPGAs because of the elimination of configuration memories. The proposed processor architecture can be reconfigured by behavioral synthesis with higher level language specification. Consequently, compensation functions are implemented in a single chip without accommodating program memories, which is accompanied with conventional microprocessors, while maintaining the comparable performance. This enables us to embed a processor element on each infrared signal detector output channel.
RTEMS SMP and MTAPI for Efficient Multi-Core Space Applications on LEON3/LEON4 Processors

NASA Astrophysics Data System (ADS)

Cederman, Daniel; Hellstrom, Daniel; Sherrill, Joel; Bloom, Gedare; Patte, Mathieu; Zulianello, Marco

2015-09-01

This paper presents the final result of an European Space Agency (ESA) activity aimed at improving the software support for LEON processors used in SMP configurations. One of the benefits of using a multicore system in a SMP configuration is that in many instances it is possible to better utilize the available processing resources by load balancing between cores. This however comes with the cost of having to synchronize operations between cores, leading to increased complexity. While in an AMP system one can use multiple instances of operating systems that are only uni-processor capable, a SMP system requires the operating system to be written to support multicore systems. In this activity we have improved and extended the SMP support of the RTEMS real-time operating system and ensured that it fully supports the multicore capable LEON processors. The targeted hardware in the activity has been the GR712RC, a dual-core core LEON3FT processor, and the functional prototype of ESA's Next Generation Multiprocessor (NGMP), a quad core LEON4 processor. The final version of the NGMP is now available as a product under the name GR740. An implementation of the Multicore Task Management API (MTAPI) has been developed as part of this activity to aid in the parallelization of applications for RTEMS SMP. It allows for simplified development of parallel applications using the task-based programming model. An existing space application, the Gaia Video Processing Unit, has been ported to RTEMS SMP using the MTAPI implementation to demonstrate the feasibility and usefulness of multicore processors for space payload software. The activity is funded by ESA under contract 4000108560/13/NL/JK. Gedare Bloom is supported in part by NSF CNS-0934725.
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

PubMed Central

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

PubMed

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Multiprocessor switch with selective pairing

DOEpatents

Gara, Alan; Gschwind, Michael K; Salapura, Valentina

2014-03-11

System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus
Exact diagonalization of quantum lattice models on coprocessors

NASA Astrophysics Data System (ADS)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Case for a field-programmable gate array multicore hybrid machine for an image-processing application

NASA Astrophysics Data System (ADS)

Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos

2011-01-01

General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.

Replication of Space-Shuttle Computers in FPGAs and ASICs

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.

2008-01-01

A document discusses the replication of the functionality of the onboard space-shuttle general-purpose computers (GPCs) in field-programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs). The purpose of the replication effort is to enable utilization of proven space-shuttle flight software and software-development facilities to the extent possible during development of software for flight computers for a new generation of launch vehicles derived from the space shuttles. The replication involves specifying the instruction set of the central processing unit and the input/output processor (IOP) of the space-shuttle GPC in a hardware description language (HDL). The HDL is synthesized to form a "core" processor in an FPGA or, less preferably, in an ASIC. The core processor can be used to create a flight-control card to be inserted into a new avionics computer. The IOP of the GPC as implemented in the core processor could be designed to support data-bus protocols other than that of a multiplexer interface adapter (MIA) used in the space shuttle. Hence, a computer containing the core processor could be tailored to communicate via the space-shuttle GPC bus and/or one or more other buses.
Comparison of speech perception performance between Sprint/Esprit 3G and Freedom processors in children implanted with nucleus cochlear implants.

PubMed

Santarelli, Rosamaria; Magnavita, Vincenzo; De Filippi, Roberta; Ventura, Laura; Genovese, Elisabetta; Arslan, Edoardo

2009-04-01

To compare speech perception performance in children fitted with previous generation Nucleus sound processor, Sprint or Esprit 3G, and the Freedom, the most recently released system from the Cochlear Corporation that features a larger input dynamic range. Prospective intrasubject comparative study. University Medical Center. Seventeen prelingually deafened children who had received the Nucleus 24 cochlear implant and used the Sprint or Esprit 3G sound processor. Cochlear implantation with Cochlear device. Speech perception was evaluated at baseline (Sprint, n = 11; Esprit 3G, n = 6) and after 1 month's experience with the Freedom sound processor. Identification and recognition of disyllabic words and identification of vowels were performed via recorded voice in quiet (70 dB [A]), in the presence of background noise at various levels of signal-to-noise ratio (+10, +5, 0, -5) and at a soft presentation level (60 dB [A]). Consonant identification and recognition of disyllabic words, trisyllabic words, and sentences were evaluated in live voice. Frequency discrimination was measured in a subset of subjects (n = 5) by using an adaptive, 3-interval, 3-alternative, forced-choice procedure. Identification of disyllabic words administered at a soft presentation level showed a significant increase when switching to the Freedom compared with the previously worn processor in children using the Sprint or Esprit 3G. Identification and recognition of disyllabic words in the presence of background noise as well as consonant identification and sentence recognition increased significantly for the Freedom compared with the previously worn device only in children fitted with the Sprint. Frequency discrimination was significantly better when switching to the Freedom compared with the previously worn processor. Serial comparisons revealed that that speech perception performance evaluated in children aged 5 to 15 years was superior with the Freedom than previous generations of Nucleus sound processors. These differences are deemed to ensue from an increased input dynamic range, a feature that offers potentially enhanced phonemic discrimination.
Benchmarking NWP Kernels on Multi- and Many-core Processors

NASA Astrophysics Data System (ADS)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions

DTIC Science & Technology

2014-05-01

processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
The parallel algorithm for the 2D discrete wavelet transform

NASA Astrophysics Data System (ADS)

Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

2018-04-01

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

NASA Astrophysics Data System (ADS)

Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Learn, Mark Walter

Sandia National Laboratories is currently developing new processing and data communication architectures for use in future satellite payloads. These architectures will leverage the flexibility and performance of state-of-the-art static-random-access-memory-based Field Programmable Gate Arrays (FPGAs). One such FPGA is the radiation-hardened version of the Virtex-5 being developed by Xilinx. However, not all features of this FPGA are being radiation-hardened by design and could still be susceptible to on-orbit upsets. One such feature is the embedded hard-core PPC440 processor. Since this processor is implemented in the FPGA as a hard-core, traditional mitigation approaches such as Triple Modular Redundancy (TMR) are not availablemore » to improve the processor's on-orbit reliability. The goal of this work is to investigate techniques that can help mitigate the embedded hard-core PPC440 processor within the Virtex-5 FPGA other than TMR. Implementing various mitigation schemes reliably within the PPC440 offers a powerful reconfigurable computing resource to these node-based processing architectures. This document summarizes the work done on the cache mitigation scheme for the embedded hard-core PPC440 processor within the Virtex-5 FPGAs, and describes in detail the design of the cache mitigation scheme and the testing conducted at the radiation effects facility on the Texas A&M campus.« less
Interactive high-resolution isosurface ray casting on multicore processors.

PubMed

Wang, Qin; JaJa, Joseph

2008-01-01

We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform.
Energy consumption estimation of an OMAP-based Android operating system

NASA Astrophysics Data System (ADS)

González, Gabriel; Juárez, Eduardo; Castro, Juan José; Sanz, César

2011-05-01

System-level energy optimization of battery-powered multimedia embedded systems has recently become a design goal. The poor operational time of multimedia terminals makes computationally demanding applications impractical in real scenarios. For instance, the so-called smart-phones are currently unable to remain in operation longer than several hours. The OMAP3530 processor basically consists of two processing cores, a General Purpose Processor (GPP) and a Digital Signal Processor (DSP). The former, an ARM Cortex-A8 processor, is aimed to run a generic Operating System (OS) while the latter, a DSP core based on the C64x+, has architecture optimized for video processing. The BeagleBoard, a commercial prototyping board based on the OMAP processor, has been used to test the Android Operating System and measure its performance. The board has 128 MB of SDRAM external memory, 256 MB of Flash external memory and several interfaces. Note that the clock frequency of the ARM and DSP OMAP cores is 600 MHz and 430 MHz, respectively. This paper describes the energy consumption estimation of the processes and multimedia applications of an Android v1.6 (Donut) OS on the OMAP3530-Based BeagleBoard. In addition, tools to communicate the two processing cores have been employed. A test-bench to profile the OS resource usage has been developed. As far as the energy estimates concern, the OMAP processor energy consumption model provided by the manufacturer has been used. The model is basically divided in two energy components. The former, the baseline core energy, describes the energy consumption that is independent of any chip activity. The latter, the module active energy, describes the energy consumed by the active modules depending on resource usage.
JPRS Report, Science & Technology, Europe.

DTIC Science & Technology

1991-04-30

processor in collaboration with Intel . The processor , christened Touchstone, will be used as the core of a parallel computer with 2,000 processors . One of...ELECTRONIQUE HEBDO in French 24 Jan 91 pp 14-15 [Article by Claire Remy: "Everything Set for Neural Signal Processors " first paragraph is ELECTRONIQUE...paving the way for neural signal processors in so doing. The principal advantage of this specific circuit over a neuromimetic software program is
Influence of rotational energy barriers to the conformational search of protein loops in molecular dynamics and ranking the conformations.

PubMed

Tappura, K

2001-08-15

An adjustable-barrier dihedral angle potential was added as an extension to a novel, previously presented soft-core potential to study its contribution to the efficacy of the search of the conformational space in molecular dynamics. As opposed to the conventional soft-core potential functions, the leading principle in the design of the new soft-core potential, as well as of its extension, the soft-core and adjustable-barrier dihedral angle (SCADA) potential (referred as the SCADA potential), was to maintain the main equilibrium properties of the original force field. This qualifies the methods for a variety of a priori modeling problems without need for additional restraints typically required with the conventional soft-core potentials. In the present study, the different potential energy functions are applied to the problem of predicting loop conformations in proteins. Comparison of the performance of the soft-core and SCADA potential showed that the main hurdles for the efficient sampling of the conformational space of (loops in) proteins are related to the high-energy barriers caused by the Lennard-Jones and Coulombic energy terms, and not to the rotational barriers, although the conformational search can be further enhanced by lowering the rotational barriers of the dihedral angles. Finally, different evaluation methods were studied and a few promising criteria found to distinguish the near-native loop conformations from the wrong ones.
An FPGA computing demo core for space charge simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Jinyuan; Huang, Yifei; /Fermilab

2009-01-01

In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computedmore » using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.« less
Reconfigurable lattice mesh designs for programmable photonic processors.

PubMed

Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A

2016-05-30

We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Tumeo, Antonino; Secchi, Simone

Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G; Salapura, Valentina

2014-12-02

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Design of a dataway processor for a parallel image signal processing system

NASA Astrophysics Data System (ADS)

Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu

1995-04-01

Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
Performance analysis of the GR712RC dual-core LEON3FT SPARC V8 processor in an asymmetric multi-processing environment

NASA Astrophysics Data System (ADS)

Giusi, Giovanni; Liu, Scige J.; Galli, Emanuele; Di Giorgio, Anna M.; Farina, Maria; Vertolli, Nello; Di Lellis, Andrea M.

2016-07-01

In this paper we present the results of a series of performance tests carried out on a prototype board mounting the Cobham Gaisler GR712RC Dual Core LEON3FT processor. The aim was the characterization of the performances of the dual core processor when used for executing a highly demanding lossless compression task, acting on data segments continuously copied from the static memory to the processor RAM. The selection of the compression activity to evaluate the performances was driven by the possibility of a comparison with previously executed tests on the Cobham/Aeroflex Gaisler UT699 LEON3FT SPARC™ V8. The results of the test activity have shown a factor 1.6 of improvement with respect to the previous tests, which can easily be improved by adopting a faster onboard board clock, and provided indications on the best size of the data chunks to be used in the compression activity.
The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor

DTIC Science & Technology

2015-06-13

The Berkeley Out-of-Order Machine (BOOM): An Industry- Competitive, Synthesizable, Parameterized RISC-V Processor Christopher Celio David A...Synthesizable, Parameterized RISC-V Processor Christopher Celio, David Patterson, and Krste Asanović University of California, Berkeley, California 94720...Order Machine BOOM is a synthesizable, parameterized, superscalar out- of-order RISC-V core designed to serve as the prototypical baseline processor
Options for Parallelizing a Planning and Scheduling Algorithm

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

2011-01-01

Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Fault-Tolerant, Real-Time, Multi-Core Computer System

NASA Technical Reports Server (NTRS)

Gostelow, Kim P.

2012-01-01

A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.

MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
Initial Performance Results on IBM POWER6

NASA Technical Reports Server (NTRS)

Saini, Subbash; Talcott, Dale; Jespersen, Dennis; Djomehri, Jahed; Jin, Haoqiang; Mehrotra, Piysuh

2008-01-01

The POWER5+ processor has a faster memory bus than that of the previous generation POWER5 processor (533 MHz vs. 400 MHz), but the measured per-core memory bandwidth of the latter is better than that of the former (5.7 GB/s vs. 4.3 GB/s). The reason for this is that in the POWER5+, the two cores on the chip share the L2 cache, L3 cache and memory bus. The memory controller is also on the chip and is shared by the two cores. This serializes the path to memory. For consistently good performance on a wide range of applications, the performance of the processor, the memory subsystem, and the interconnects (both latency and bandwidth) should be balanced. Recognizing this, IBM has designed the Power6 processor so as to avoid the bottlenecks due to the L2 cache, memory controller and buffer chips of the POWER5+. Unlike the POWER5+, each core in the POWER6 has its own L2 cache (4 MB - double that of the Power5+), memory controller and buffer chips. Each core in the POWER6 runs at 4.7 GHz instead of 1.9 GHz in POWER5+. In this paper, we evaluate the performance of a dual-core Power6 based IBM p6-570 system, and we compare its performance with that of a dual-core Power5+ based IBM p575+ system. In this evaluation, we have used the High- Performance Computing Challenge (HPCC) benchmarks, NAS Parallel Benchmarks (NPB), and four real-world applications--three from computational fluid dynamics and one from climate modeling.
Stream Processors

NASA Astrophysics Data System (ADS)

Erez, Mattan; Dally, William J.

Stream processors, like other multi core architectures partition their functional units and storage into multiple processing elements. In contrast to typical architectures, which contain symmetric general-purpose cores and a cache hierarchy, stream processors have a significantly leaner design. Stream processors are specifically designed for the stream execution model, in which applications have large amounts of explicit parallel computation, structured and predictable control, and memory accesses that can be performed at a coarse granularity. Applications in the streaming model are expressed in a gather-compute-scatter form, yielding programs with explicit control over transferring data to and from on-chip memory. Relying on these characteristics, which are common to many media processing and scientific computing applications, stream architectures redefine the boundary between software and hardware responsibilities with software bearing much of the complexity required to manage concurrency, locality, and latency tolerance. Thus, stream processors have minimal control consisting of fetching medium- and coarse-grained instructions and executing them directly on the many ALUs. Moreover, the on-chip storage hierarchy of stream processors is under explicit software control, as is all communication, eliminating the need for complex reactive hardware mechanisms.
An FPGA- Based General-Purpose Data Acquisition Controller

NASA Astrophysics Data System (ADS)

Robson, C. C. W.; Bousselham, A.; Bohm

2006-08-01

System development in advanced FPGAs allows considerable flexibility, both during development and in production use. A mixed firmware/software solution allows the developer to choose what shall be done in firmware or software, and to make that decision late in the process. However, this flexibility comes at the cost of increased complexity. We have designed a modular development framework to help to overcome these issues of increased complexity. This framework comprises a generic controller that can be adapted for different systems by simply changing the software or firmware parts. The controller can use both soft and hard processors, with or without an RTOS, based on the demands of the system to be developed. The resulting system uses the Internet for both control and data acquisition. In our studies we developed the embedded system in a Xilinx Virtex-II Pro FPGA, where we used both PowerPC and MicroBlaze cores, http, Java, and LabView for control and communication, together with the MicroC/OS-II and OSE operating systems
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

NASA Astrophysics Data System (ADS)

Leggett, C.; Binet, S.; Jackson, K.; Levinthal, D.; Tatarkhanov, M.; Yao, Y.

2011-12-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
Usability of a soft-electron (low-energy electron) machine for disinfestation of grains contaminated with insect pests

NASA Astrophysics Data System (ADS)

Imamura, Taro; Miyanoshita, Akihiro; Todoriki, Setsuko; Hayashi, Toru

2004-09-01

Efficacy of soft-electron treatment for disinfestations of grains was investigated by treating pre-infested brown rice and adzuki bean with a commercial-scale soft-electron machine (soft-electron processor). Soft-electrons at 150 kV efficiently disinfested brown rice grains pre-infested with maize weevil ( Stiophilus zeamais Motchulsky) and Indian meal moth ( Plodia interpunctella (Hübner)) and adzuki beans with adzuki bean weevil ( Callosobruchus chinensis (Linne)), although small numbers of the internal feeders such as C. chinensis in adzuki bean and S. zeamais in brown rice survived. The results indicate that the commercial-scale soft-electron machine can disinfest grains and beans, especially those contaminated with external feeders.
Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sitaraman, Hariswaran; Grout, Ray W

This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved heremore » through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.« less
Performance of the Cell processor for biomolecular simulations

NASA Astrophysics Data System (ADS)

De Fabritiis, G.

2007-06-01

The new Cell processor represents a turning point for computing intensive applications. Here, I show that for molecular dynamics it is possible to reach an impressive sustained performance in excess of 30 Gflops with a peak of 45 Gflops for the non-bonded force calculations, over one order of magnitude faster than a single core standard processor.
Motion and Emotional Behavior Design for Pet Robot Dog

NASA Astrophysics Data System (ADS)

Cheng, Chi-Tai; Yang, Yu-Ting; Miao, Shih-Heng; Wong, Ching-Chang

A pet robot dog with two ears, one mouth, one facial expression plane, and one vision system is designed and implemented so that it can do some emotional behaviors. Three processors (Inter® Pentium® M 1.0 GHz, an 8-bit processer 8051, and embedded soft-core processer NIOS) are used to control the robot. One camera, one power detector, four touch sensors, and one temperature detector are used to obtain the information of the environment. The designed robot with 20 DOF (degrees of freedom) is able to accomplish the walking motion. A behavior system is built on the implemented pet robot so that it is able to choose a suitable behavior for different environmental situation. From the practical test, we can see that the implemented pet robot dog can do some emotional interaction with the human.
Present Status and Extensions of the Monte Carlo Performance Benchmark

NASA Astrophysics Data System (ADS)

Hoogenboom, J. Eduard; Petrovic, Bojan; Martin, William R.

2014-06-01

The NEA Monte Carlo Performance benchmark started in 2011 aiming to monitor over the years the abilities to perform a full-size Monte Carlo reactor core calculation with a detailed power production for each fuel pin with axial distribution. This paper gives an overview of the contributed results thus far. It shows that reaching a statistical accuracy of 1 % for most of the small fuel zones requires about 100 billion neutron histories. The efficiency of parallel execution of Monte Carlo codes on a large number of processor cores shows clear limitations for computer clusters with common type computer nodes. However, using true supercomputers the speedup of parallel calculations is increasing up to large numbers of processor cores. More experience is needed from calculations on true supercomputers using large numbers of processors in order to predict if the requested calculations can be done in a short time. As the specifications of the reactor geometry for this benchmark test are well suited for further investigations of full-core Monte Carlo calculations and a need is felt for testing other issues than its computational performance, proposals are presented for extending the benchmark to a suite of benchmark problems for evaluating fission source convergence for a system with a high dominance ratio, for coupling with thermal-hydraulics calculations to evaluate the use of different temperatures and coolant densities and to study the correctness and effectiveness of burnup calculations. Moreover, other contemporary proposals for a full-core calculation with realistic geometry and material composition will be discussed.
Early Student Support for Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

DTIC Science & Technology

2016-05-07

REPORT DOCUMENTATION PAGE I . ... ... .. . ,...,.., ............. OMB No. 0704-0188 The public reporting burden for this collection of...Student Support for Appl ication of Advanced Multi- Core Processor N00014-12-1-0298 Technologies to Oceanographic Research Sb. GRANT NUMBER Sc...communications protocols (i.e. UART, I2C, and SPI), through the , ’ . handing off of the data to the server APis. By providing a common set of tools
Infrared small target tracking based on SOPC

NASA Astrophysics Data System (ADS)

Hu, Taotao; Fan, Xiang; Zhang, Yu-Jin; Cheng, Zheng-dong; Zhu, Bin

2011-01-01

The paper presents a low cost FPGA based solution for a real-time infrared small target tracking system. A specialized architecture is presented based on a soft RISC processor capable of running kernel based mean shift tracking algorithm. Mean shift tracking algorithm is realized in NIOS II soft-core with SOPC (System on a Programmable Chip) technology. Though mean shift algorithm is widely used for target tracking, the original mean shift algorithm can not be directly used for infrared small target tracking. As infrared small target only has intensity information, so an improved mean shift algorithm is presented in this paper. How to describe target will determine whether target can be tracked by mean shift algorithm. Because color target can be tracked well by mean shift algorithm, imitating color image expression, spatial component and temporal component are advanced to describe target, which forms pseudo-color image. In order to improve the processing speed parallel technology and pipeline technology are taken. Two RAM are taken to stored images separately by ping-pong technology. A FLASH is used to store mass temp data. The experimental results show that infrared small target is tracked stably in complicated background.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

PubMed

Cheung, Kit; Schultz, Simon R; Luk, Wayne

2015-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors

PubMed Central

Cheung, Kit; Schultz, Simon R.; Luk, Wayne

2016-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation. PMID:26834542
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

NASA Astrophysics Data System (ADS)

DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

2018-03-01

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
Energy Efficient Real-Time Scheduling Using DPM on Mobile Sensors with a Uniform Multi-Cores

PubMed Central

Kim, Youngmin; Lee, Chan-Gun

2017-01-01

In wireless sensor networks (WSNs), sensor nodes are deployed for collecting and analyzing data. These nodes use limited energy batteries for easy deployment and low cost. The use of limited energy batteries is closely related to the lifetime of the sensor nodes when using wireless sensor networks. Efficient-energy management is important to extending the lifetime of the sensor nodes. Most effort for improving power efficiency in tiny sensor nodes has focused mainly on reducing the power consumed during data transmission. However, recent emergence of sensor nodes equipped with multi-cores strongly requires attention to be given to the problem of reducing power consumption in multi-cores. In this paper, we propose an energy efficient scheduling method for sensor nodes supporting a uniform multi-cores. We extend the proposed T-Ler plane based scheduling for global optimal scheduling of a uniform multi-cores and multi-processors to enable power management using dynamic power management. In the proposed approach, processor selection for a scheduling and mapping method between the tasks and processors is proposed to efficiently utilize dynamic power management. Experiments show the effectiveness of the proposed approach compared to other existing methods. PMID:29240695
Processor-in-memory-and-storage architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeBenedictis, Erik

A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code wordmore » is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.« less
Emerging Radio and Manet Technology Study: Research Support for a Survey of State-of-the-art Commercial and Military Hardware/Software for Mobile Ad Hoc Networks

DTIC Science & Technology

2014-10-01

44 Table 19: Raspberry Pi Information...boards – These are single board devices targeted to education and embedding, the best known being the Raspberry Pi ; and 3. Development boards – These...popular, as it has high performance processor (perhaps 4 times the power of a Raspberry Pi ) with dual core processors running at 1.6 GHz and the cost is
Optimization of image processing algorithms on mobile platforms

NASA Astrophysics Data System (ADS)

Poudel, Pramod; Shirvaikar, Mukul

2011-03-01

This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.

Method to implement the CCD timing generator based on FPGA

NASA Astrophysics Data System (ADS)

Li, Binhua; Song, Qian; He, Chun; Jin, Jianhui; He, Lin

2010-07-01

With the advance of the PFPA technology, the design methodology of digital systems is changing. In recent years we develop a method to implement the CCD timing generator based on FPGA and VHDL. This paper presents the principles and implementation skills of the method. Taking a developed camera as an example, we introduce the structure, input and output clocks/signals of a timing generator implemented in the camera. The generator is composed of a top module and a bottom module. The bottom one is made up of 4 sub-modules which correspond to 4 different operation modes. The modules are implemented by 5 VHDL programs. Frame charts of the architecture of these programs are shown in the paper. We also describe implementation steps of the timing generator in Quartus II, and the interconnections between the generator and a Nios soft core processor which is the controller of this generator. Some test results are presented in the end.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

NASA Astrophysics Data System (ADS)

Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

2007-04-01

In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
Sentinel-2 Level 2A Prototype Processor: Architecture, Algorithms And First Results

NASA Astrophysics Data System (ADS)

Muller-Wilm, Uwe; Louis, Jerome; Richter, Rudolf; Gascon, Ferran; Niezette, Marc

2013-12-01

Sen2Core is a prototype processor for Sentinel-2 Level 2A product processing and formatting. The processor is developed for and with ESA and performs the tasks of Atmospheric Correction and Scene Classification of Level 1C input data. Level 2A outputs are: Bottom-Of- Atmosphere (BOA) corrected reflectance images, Aerosol Optical Thickness-, Water Vapour-, Scene Classification maps and Quality indicators, including cloud and snow probabilities. The Level 2A Product Formatting performed by the processor follows the specification of the Level 1C User Product.
Performance of VPIC on Sequoia

NASA Astrophysics Data System (ADS)

Nystrom, William

2014-10-01

Sequoia is a major DOE computing resource which is characteristic of future resources in that it has many threads per compute node, 64, and the individual processor cores are simpler and less powerful than cores on previous processors like Intel's Sandy Bridge or AMD's Opteron. An effort is in progress to port VPIC to the Blue Gene Q architecture of Sequoia and evaluate its performance. Results of this work will be presented on single node performance of VPIC as well as multi-node scaling.
Active non-volatile memory post-processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kannan, Sudarsun; Milojicic, Dejan S.; Talwar, Vanish

A computing node includes an active Non-Volatile Random Access Memory (NVRAM) component which includes memory and a sub-processor component. The memory is to store data chunks received from a processor core, the data chunks comprising metadata indicating a type of post-processing to be performed on data within the data chunks. The sub-processor component is to perform post-processing of said data chunks based on said metadata.
Underwater Threat Source Localization: Processing Sensor Network TDOAs with a Terascale Optical Core Device

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barhen, Jacob; Imam, Neena

2007-01-01

Revolutionary computing technologies are defined in terms of technological breakthroughs, which leapfrog over near-term projected advances in conventional hardware and software to produce paradigm shifts in computational science. For underwater threat source localization using information provided by a dynamical sensor network, one of the most promising computational advances builds upon the emergence of digital optical-core devices. In this article, we present initial results of sensor network calculations that focus on the concept of signal wavefront time-difference-of-arrival (TDOA). The corresponding algorithms are implemented on the EnLight processing platform recently introduced by Lenslet Laboratories. This tera-scale digital optical core processor is optimizedmore » for array operations, which it performs in a fixed-point-arithmetic architecture. Our results (i) illustrate the ability to reach the required accuracy in the TDOA computation, and (ii) demonstrate that a considerable speed-up can be achieved when using the EnLight 64a prototype processor as compared to a dual Intel XeonTM processor.« less
Fault Mitigation Schemes for Future Spaceflight Multicore Processors

NASA Technical Reports Server (NTRS)

Alexander, James W.; Clement, Bradley J.; Gostelow, Kim P.; Lai, John Y.

2012-01-01

Future planetary exploration missions demand significant advances in on-board computing capabilities over current avionics architectures based on a single-core processing element. The state-of-the-art multi-core processor provides much promise in meeting such challenges while introducing new fault tolerance problems when applied to space missions. Software-based schemes are being presented in this paper that can achieve system-level fault mitigation beyond that provided by radiation-hard-by-design (RHBD). For mission and time critical applications such as the Terrain Relative Navigation (TRN) for planetary or small body navigation, and landing, a range of fault tolerance methods can be adapted by the application. The software methods being investigated include Error Correction Code (ECC) for data packet routing between cores, virtual network routing, Triple Modular Redundancy (TMR), and Algorithm-Based Fault Tolerance (ABFT). A robust fault tolerance framework that provides fail-operational behavior under hard real-time constraints and graceful degradation will be demonstrated using TRN executing on a commercial Tilera(R) processor with simulated fault injections.
Development of small scale cluster computer for numerical analysis

NASA Astrophysics Data System (ADS)

Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

2017-09-01

In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G.; Salapura, Valentina

2012-07-24

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Application of Prognostic Health Management in Digital Electronic Systems

DTIC Science & Technology

2007-01-01

variable external supply applied the necessary core power to the processor while the motherboard continued to source power from the ATX supply. By...isolating the processor power from the motherboard power , control over the aging profile of the processor was achieved. Once nominal operating...Physics-of-failure RISC – Reduced Instruction Set Computer RUL – Remaining Useful Life 1 1-4244-0525-4/07/$20.00 ©2007 IEEE. Paper 1326
High-Speed Computation of the Kleene Star in Max-Plus Algebraic System Using a Cell Broadband Engine

NASA Astrophysics Data System (ADS)

Goto, Hiroyuki

This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband Engine™ (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3™ equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.
Performance Evaluation of NWChem Ab-Initio Molecular Dynamics (AIMD) Simulations on the Intel® Xeon Phi™ Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Jacquelin, Mathias; De Jong, Wibe A.

2017-10-20

Ab-initio Molecular Dynamics (AIMD) methods are an important class of algorithms, as they enable scientists to understand the chemistry and dynamics of molecular and condensed phase systems while retaining a first-principles-based description of their interactions. Many-core architectures such as the Intel® Xeon Phi™ processor are an interesting and promising target for these algorithms, as they can provide the computational power that is needed to solve interesting problems in chemistry. In this paper, we describe the efforts of refactoring the existing AIMD plane-wave method of NWChem from an MPI-only implementation to a scalable, hybrid code that employs MPI and OpenMP tomore » exploit the capabilities of current and future many-core architectures. We describe the optimizations required to get close to optimal performance for the multiplication of the tall-and-skinny matrices that form the core of the computational algorithm. We present strong scaling results on the complete AIMD simulation for a test case that simulates 256 water molecules and that strong-scales well on a cluster of 1024 nodes of Intel Xeon Phi processors. We compare the performance obtained with a cluster of dual-socket Intel® Xeon® E5–2698v3 processors.« less
Multi-level Hierarchical Poly Tree computer architectures

NASA Technical Reports Server (NTRS)

Padovan, Joe; Gute, Doug

1990-01-01

Based on the concept of hierarchical substructuring, this paper develops an optimal multi-level Hierarchical Poly Tree (HPT) parallel computer architecture scheme which is applicable to the solution of finite element and difference simulations. Emphasis is given to minimizing computational effort, in-core/out-of-core memory requirements, and the data transfer between processors. In addition, a simplified communications network that reduces the number of I/O channels between processors is presented. HPT configurations that yield optimal superlinearities are also demonstrated. Moreover, to generalize the scope of applicability, special attention is given to developing: (1) multi-level reduction trees which provide an orderly/optimal procedure by which model densification/simplification can be achieved, as well as (2) methodologies enabling processor grading that yields architectures with varying types of multi-level granularity.
Adaptive packet switch with an optical core (demonstrator)

NASA Astrophysics Data System (ADS)

Abdo, Ahmad; Bishtein, Vadim; Clark, Stewart A.; Dicorato, Pino; Lu, David T.; Paredes, Sofia A.; Taebi, Sareh; Hall, Trevor J.

2004-11-01

A three-stage opto-electronic packet switch architecture is described consisting of a reconfigurable optical centre stage surrounded by two electronic buffering stages partitioned into sectors to ease memory contention. A Flexible Bandwidth Provision (FBP) algorithm, implemented on a soft-core processor, is used to change the configuration of the input sectors and optical centre stage to set up internal paths that will provide variable bandwidth to serve the traffic. The switch is modeled by a bipartite graph built from a service matrix, which is a function of the arriving traffic. The bipartite graph is decomposed by solving an edge-colouring problem and the resulting permutations are used to configure the switch. Simulation results show that this architecture exhibits a dramatic reduction of complexity and increased potential for scalability, at the price of only a modest spatial speed-up k, 1
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

NASA Astrophysics Data System (ADS)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
Electronic Structure Calculations and Adaptation Scheme in Multi-core Computing Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seshagiri, Lakshminarasimhan; Sosonkina, Masha; Zhang, Zhao

2009-05-20

Multi-core processing environments have become the norm in the generic computing environment and are being considered for adding an extra dimension to the execution of any application. The T2 Niagara processor is a very unique environment where it consists of eight cores having a capability of running eight threads simultaneously in each of the cores. Applications like General Atomic and Molecular Electronic Structure (GAMESS), used for ab-initio molecular quantum chemistry calculations, can be good indicators of the performance of such machines and would be a guideline for both hardware designers and application programmers. In this paper we try to benchmarkmore » the GAMESS performance on a T2 Niagara processor for a couple of molecules. We also show the suitability of using a middleware based adaptation algorithm on GAMESS on such a multi-core environment.« less
Adaptive linear predictor FIR filter based on the Cyclone V FPGA with HPS to reduce narrow band RFI in AERA radio detection of cosmic rays

DOE Office of Scientific and Technical Information (OSTI.GOV)

Szadkowski, Zbigniew

We present the new approach to a filtering of radio frequency interferences (RFI) in the Auger Engineering Radio Array (AERA) which study the electromagnetic part of the Extensive Air Showers. The radio stations can observe radio signals caused by coherent emissions due to geomagnetic radiation and charge excess processes. AERA observes frequency band from 30 to 80 MHz. This range is highly contaminated by human-made RFI. In order to improve the signal to noise ratio RFI filters are used in AERA to suppress this contamination. The first kind of filter used by AERA was the Median one, based on themore » Fast Fourier Transform (FFT) technique. The second one, which is currently in use, is the infinite impulse response (IIR) notch filter. The proposed new filter is a finite impulse response (FIR) filter based on a linear prediction (LP). A periodic contamination hidden in a registered signal (digitized in the ADC) can be extracted and next subtracted to make signal cleaner. The FIR filter requires a calculation of n=32, 64 or even 128 coefficients (dependent on a required speed or accuracy) by solving of n linear equations with coefficients built from the covariance Toeplitz matrix. This matrix can be solved by the Levinson recursion, which is much faster than the Gauss procedure. The filter has been already tested in the real AERA radio stations on Argentinean pampas with a very successful results. The linear equations were solved either in the virtual soft-core NIOSR processor (implemented in the FPGA chip as a net of logic elements) or in the external Voipac PXA270M ARM processor. The NIOS processor is relatively slow (50 MHz internal clock), calculations performed in an external processor consume a significant amount of time for data exchange between the FPGA and the processor. Test showed a very good efficiency of the RFI suppression for stationary (long-term) contaminations. However, we observed a short-time contaminations, which could not be suppressed either by the IIR-notch filter or by the FIR filter based on the linear predictions. For the LP FIR filter the refreshment time of the filter coefficients was to long and filter did not keep up with the changes of a contamination structure, mainly due to a long calculation time in a slow processors. We propose to use the Cyclone V SE chip with embedded micro-controller operating with 925 MHz internal clock to significantly reduce a refreshment time of the FIR coefficients. The lab results are promising. (authors)« less
FPGA Acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods.

PubMed

Zierke, Stephanie; Bakos, Jason D

2010-04-12

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10x speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs).
Analyzing Reliability and Performance Trade-Offs of HLS-Based Designs in SRAM-Based FPGAs Under Soft Errors

NASA Astrophysics Data System (ADS)

Tambara, Lucas Antunes; Tonfat, Jorge; Santos, André; Kastensmidt, Fernanda Lima; Medina, Nilberto H.; Added, Nemitala; Aguiar, Vitor A. P.; Aguirre, Fernando; Silveira, Marcilei A. G.

2017-02-01

The increasing system complexity of FPGA-based hardware designs and shortening of time-to-market have motivated the adoption of new designing methodologies focused on addressing the current need for high-performance circuits. High-Level Synthesis (HLS) tools can generate Register Transfer Level (RTL) designs from high-level software programming languages. These tools have evolved significantly in recent years, providing optimized RTL designs, which can serve the needs of safety-critical applications that require both high performance and high reliability levels. However, a reliability evaluation of HLS-based designs under soft errors has not yet been presented. In this work, the trade-offs of different HLS-based designs in terms of reliability, resource utilization, and performance are investigated by analyzing their behavior under soft errors and comparing them to a standard processor-based implementation in an SRAM-based FPGA. Results obtained from fault injection campaigns and radiation experiments show that it is possible to increase the performance of a processor-based system up to 5,000 times by changing its architecture with a small impact in the cross section (increasing up to 8 times), and still increasing the Mean Workload Between Failures (MWBF) of the system.
Moats and Drawbridges: An Isolation Primitive for Reconfigurable Hardware Based Systems

DTIC Science & Technology

2007-05-01

these systems, and after being run through an optimizing CAD tool the resulting circuit is a single entangled mess of gates and wires. To prevent the...translates MATLAB [48] algorithms into HDL, logic synthesis translates this HDL into a netlist, a synthesis tool uses a place-and-route algorithm to...Core Soft Core µ Soft P Core µP Core Hard Soft Algorithms MATLAB gcc ExecutableC Code HDL C Code Bitstream Place and Route NetlistLogic Synthesis EDK µP

Fast 2D FWI on a multi and many-cores workstation.

NASA Astrophysics Data System (ADS)

Thierry, Philippe; Donno, Daniela; Noble, Mark

2014-05-01

Following the introduction of x86 co-processors (Xeon Phi) and the performance increase of standard 2-socket workstations using the latest 12 cores E5-v2 x86-64 CPU, we present here a MPI + OpenMP implementation of an acoustic 2D FWI (full waveform inversion) code which simultaneously runs on the CPUs and on the co-processors installed in a workstation. The main advantage of running a 2D FWI on a workstation is to be able to quickly evaluate new features such as more complicated wave equations, new cost functions, finite-difference stencils or boundary conditions. Since the co-processor is made of 61 in-order x86 cores, each of them having up to 4 threads, this many-core can be seen as a shared memory SMP (symmetric multiprocessing) machine with its own IP address. Depending on the vendor, a single workstation can handle several co-processors making the workstation as a personal cluster under the desk. The original Fortran 90 CPU version of the 2D FWI code is just recompiled to get a Xeon Phi x86 binary. This multi and many-core configuration uses standard compilers and associated MPI as well as math libraries under Linux; therefore, the cost of code development remains constant, while improving computation time. We choose to implement the code with the so-called symmetric mode to fully use the capacity of the workstation, but we also evaluate the scalability of the code in native mode (i.e running only on the co-processor) thanks to the Linux ssh and NFS capabilities. Usual care of optimization and SIMD vectorization is used to ensure optimal performances, and to analyze the application performances and bottlenecks on both platforms. The 2D FWI implementation uses finite-difference time-domain forward modeling and a quasi-Newton (with L-BFGS algorithm) optimization scheme for the model parameters update. Parallelization is achieved through standard MPI shot gathers distribution and OpenMP for domain decomposition within the co-processor. Taking advantage of the 16 GB of memory available on the co-processor we are able to keep wavefields in memory to achieve the gradient computation by cross-correlation of forward and back-propagated wavefields needed by our time-domain FWI scheme, without heavy traffic on the i/o subsystem and PCIe bus. In this presentation we will also review some simple methodologies to determine performance expectation compared to real performances in order to get optimization effort estimation before starting any huge modification or rewriting of research codes. The key message is the ease of use and development of this hybrid configuration to reach not the absolute peak performance value but the optimal one that ensures the best balance between geophysical and computer developments.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sancho Pitarch, Jose Carlos; Kerbyson, Darren; Lang, Mike

Increasing the core-count on current and future processors is posing critical challenges to the memory subsystem to efficiently handle concurrent memory requests. The current trend to cope with this challenge is to increase the number of memory channels available to the processor's memory controller. In this paper we investigate the effectiveness of this approach on the performance of parallel scientific applications. Specifically, we explore the trade-off between employing multiple memory channels per memory controller and the use of multiple memory controllers. Experiments conducted on two current state-of-the-art multicore processors, a 6-core AMD Istanbul and a 4-core Intel Nehalem-EP, for amore » wide range of production applications shows that there is a diminishing return when increasing the number of memory channels per memory controller. In addition, we show that this performance degradation can be efficiently addressed by increasing the ratio of memory controllers to channels while keeping the number of memory channels constant. Significant performance improvements can be achieved in this scheme, up to 28%, in the case of using two memory controllers with each with one channel compared with one controller with two memory channels.« less
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures.

PubMed

Souris, Kevin; Lee, John Aldo; Sterpin, Edmond

2016-04-01

Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithm of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the gate/geant4 Monte Carlo application for homogeneous and heterogeneous geometries. Comparisons with gate/geant4 for various geometries show deviations within 2%-1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10(7) primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.
Development of massive multilevel molecular dynamics simulation program, Platypus (PLATform for dYnamic Protein Unified Simulation), for the elucidation of protein functions.

PubMed

Takano, Yu; Nakata, Kazuto; Yonezawa, Yasushige; Nakamura, Haruki

2016-05-05

A massively parallel program for quantum mechanical-molecular mechanical (QM/MM) molecular dynamics simulation, called Platypus (PLATform for dYnamic Protein Unified Simulation), was developed to elucidate protein functions. The speedup and the parallelization ratio of Platypus in the QM and QM/MM calculations were assessed for a bacteriochlorophyll dimer in the photosynthetic reaction center (DIMER) on the K computer, a massively parallel computer achieving 10 PetaFLOPs with 705,024 cores. Platypus exhibited the increase in speedup up to 20,000 core processors at the HF/cc-pVDZ and B3LYP/cc-pVDZ, and up to 10,000 core processors by the CASCI(16,16)/6-31G** calculations. We also performed excited QM/MM-MD simulations on the chromophore of Sirius (SIRIUS) in water. Sirius is a pH-insensitive and photo-stable ultramarine fluorescent protein. Platypus accelerated on-the-fly excited-state QM/MM-MD simulations for SIRIUS in water, using over 4000 core processors. In addition, it also succeeded in 50-ps (200,000-step) on-the-fly excited-state QM/MM-MD simulations for the SIRIUS in water. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Optimization of the coherence function estimation for multi-core central processing unit

NASA Astrophysics Data System (ADS)

Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

2017-02-01

The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
Multiple Embedded Processors for Fault-Tolerant Computing

NASA Technical Reports Server (NTRS)

Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

2005-01-01

A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Accelerating Climate Simulations Through Hybrid Computing

NASA Technical Reports Server (NTRS)

Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

2009-01-01

Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.

PubMed

Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

2015-07-01

Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Using all of your CPU's in HIPE

NASA Astrophysics Data System (ADS)

Jacobson, J. D.; Fadda, D.

2012-09-01

Modern computer architectures increasingly feature multi-core CPU's. For example, the MacbookPro features the Intel quad-core i7 processors. Through the use of hyper-threading, where each core can execute two threads simultaneously, the quad-core i7 can support eight simultaneous processing threads. All this on your laptop! This CPU power can now be put into service by scientists to perform data reduction tasks, but only if the software has been designed to take advantage of the multiple processor architectures. Up to now, software written for Herschel data reduction (HIPE), written in Jython and JAVA, is single-threaded and can only utilize a single processor. Users of HIPE do not get any advantage from the additional processors. Why not put all of the CPU resources to work reducing your data? We present a multi-threaded software application that corrects long-term transients in the signal from the PACS unchopped spectroscopy line scan mode. In this poster, we present a multi-threaded software framework to achieve performance improvements from parallel execution. We will show how a task to correct transients in the PACS Spectroscopy Pipeline for the un-chopped line scan mode, has been threaded. This computation-intensive task uses either a one-parameter or a three parameter exponential function, to characterize the transient. The task uses a JAVA implementation of Minpack, translated from the C (Moshier) and IDL (Markwardt) by the authors, to optimize the correction parameters. We also explain how to determine if a task can benefit from threading (Amdahl's Law), and if it is safe to thread. The design and implementation, using the JAVA concurrency package completions service is described. Pitfalls, timing bugs, thread safety, resource control, testing and performance improvements are described and plotted.
Reconfigurable tree architectures using subtree oriented fault tolerance

NASA Technical Reports Server (NTRS)

Lowrie, Matthew B.

1987-01-01

An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
Computational multicore on two-layer 1D shallow water equations for erodible dambreak

NASA Astrophysics Data System (ADS)

Simanjuntak, C. A.; Bagustara, B. A. R. H.; Gunawan, P. H.

2018-03-01

The simulation of erodible dambreak using two-layer shallow water equations and SCHR scheme are elaborated in this paper. The results show that the two-layer SWE model in a good agreement with the data experiment which is performed by Louvain-la-Neuve Université Catholique de Louvain. Moreover, the parallel algorithm with multicore architecture are given in the results. The results show that Computer I with processor Intel(R) Core(TM) i5-2500 CPU Quad-Core has the best performance to accelerate the computational time. Moreover, Computer III with processor AMD A6-5200 APU Quad-Core is observed has higher speedup and efficiency. The speedup and efficiency of Computer III with number of grids 3200 are 3.716050530 times and 92.9% respectively.
An efficient implementation of semi-numerical computation of the Hartree-Fock exchange on the Intel Phi processor

NASA Astrophysics Data System (ADS)

Liu, Fenglai; Kong, Jing

2018-07-01

Unique technical challenges and their solutions for implementing semi-numerical Hartree-Fock exchange on the Phil Processor are discussed, especially concerning the single- instruction-multiple-data type of processing and small cache size. Benchmark calculations on a series of buckyball molecules with various Gaussian basis sets on a Phi processor and a six-core CPU show that the Phi processor provides as much as 12 times of speedup with large basis sets compared with the conventional four-center electron repulsion integration approach performed on the CPU. The accuracy of the semi-numerical scheme is also evaluated and found to be comparable to that of the resolution-of-identity approach.
High frequency, high temperature specific core loss and dynamic B-H hysteresis loop characteristics of soft magnetic alloys

NASA Technical Reports Server (NTRS)

Wieserman, W. R.; Schwarze, G. E.; Niedra, J. M.

1990-01-01

Limited experimental data exists for the specific core loss and dynamic B-H loops for soft magnetic materials for the combined conditions of high frequency and high temperature. This experimental study investigates the specific core loss and dynamic B-H loop characteristics of Supermalloy and Metglas 2605SC over the frequency range of 1 to 50 kHz and temperature range of 23 to 300 C under sinusoidal voltage excitation. The experimental setup used to conduct the investigation is described. The effects of the maximum magnetic flux density, frequency, and temperature on the specific core loss and on the size and shape of the B-H loops are examined.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particlemore » tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.« less
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Direct access inter-process shared memory

DOEpatents

Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B

2013-10-22

A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
CMS Readiness for Multi-Core Workload Scheduling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides amore » solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.« less
CMS readiness for multi-core workload scheduling

NASA Astrophysics Data System (ADS)

Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.; Aftab Khan, F.; Letts, J.; Mason, D.; Verguilov, V.

2017-10-01

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
VENTURE/PC manual: A multidimensional multigroup neutron diffusion code system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shapiro, A.; Huria, H.C.; Cho, K.W.

1991-12-01

VENTURE/PC is a recompilation of part of the Oak Ridge BOLD VENTURE code system, which will operate on an IBM PC or compatible computer. Neutron diffusion theory solutions are obtained for multidimensional, multigroup problems. This manual contains information associated with operating the code system. The purpose of the various modules used in the code system, and the input for these modules are discussed. The PC code structure is also given. Version 2 included several enhancements not given in the original version of the code. In particular, flux iterations can be done in core rather than by reading and writing tomore » disk, for problems which allow sufficient memory for such in-core iterations. This speeds up the iteration process. Version 3 does not include any of the special processors used in the previous versions. These special processors utilized formatted input for various elements of the code system. All such input data is now entered through the Input Processor, which produces standard interface files for the various modules in the code system. In addition, a Standard Interface File Handbook is included in the documentation which is distributed with the code, to assist in developing the input for the Input Processor.« less

A new parameter-free soft-core potential for silica and its application to simulation of silica anomalies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Izvekov, Sergei, E-mail: sergiy.izvyekov.civ@mail.mil; Rice, Betsy M.

2015-12-28

A core-softening of the effective interaction between oxygen atoms in water and silica systems and its role in developing anomalous thermodynamic, transport, and structural properties have been extensively debated. For silica, the progress with addressing these issues has been hampered by a lack of effective interaction models with explicit core-softening. In this work, we present an extension of a two-body soft-core interatomic force field for silica recently reported by us [S. Izvekov and B. M. Rice, J. Chem. Phys. 136(13), 134508 (2012)] to include three-body forces. Similar to two-body interaction terms, the three-body terms are derived using parameter-free force-matching ofmore » the interactions from ab initio MD simulations of liquid silica. The derived shape of the O–Si–O three-body potential term affirms the existence of repulsion softening between oxygen atoms at short separations. The new model shows a good performance in simulating liquid, amorphous, and crystalline silica. By comparing the soft-core model and a similar model with the soft-core suppressed, we demonstrate that the topology reorganization within the local tetrahedral network and the O–O core-softening are two competitive mechanisms responsible for anomalous thermodynamic and kinetic behaviors observed in liquid and amorphous silica. The studied anomalies include the temperature of density maximum locus and anomalous diffusivity in liquid silica, and irreversible densification of amorphous silica. We show that the O–O core-softened interaction enhances the observed anomalies primarily through two mechanisms: facilitating the defect driven structural rearrangements of the silica tetrahedral network and modifying the tetrahedral ordering induced interactions toward multiple characteristic scales, the feature which underlies the thermodynamic anomalies.« less
DFT algorithms for bit-serial GaAs array processor architectures

NASA Technical Reports Server (NTRS)

Mcmillan, Gary B.

1988-01-01

Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.
Adolescents' non-core food intake: a description of what, where and with whom adolescents consume non-core foods.

PubMed

Toumpakari, Zoi; Haase, Anne M; Johnson, Laura

2016-06-01

Little is known about adolescents' non-core food intake in the UK and the eating context in which they consume non-core foods. The present study aimed to describe types of non-core foods consumed by British adolescents in total and across different eating contexts. A descriptive analysis, using cross-sectional data from food diaries. Non-core foods were classified based on cut-off points of fat and sugar from the Australian Guide to Healthy Eating. Eating context was defined as 'where' and 'with whom' adolescents consumed each food. Percentages of non-core energy were calculated for each food group in total and across eating contexts. A combined ranking was then created to account for each food's contribution to non-core energy intake and its popularity of consumption (percentage of consumers). The UK National Diet and Nutrition Survey 2008-2011. Adolescents across the UK aged 11-18 years (n 666). Non-core food comprised 39·5 % of total energy intake and was mostly 'Regular soft drinks', 'Crisps & savoury snacks', 'Chips & potato products', 'Chocolate' and 'Biscuits'. Adolescents ate 57·0 % and 51·3 % of non-core food at 'Eateries' or with 'Friends', compared with 33·2 % and 32·1 % at 'Home' or with 'Parents'. Persistent foods consumed across eating contexts were 'Regular soft drinks' and 'Chips & potato products'. Regular soft drinks contribute the most energy and are the most popular non-core food consumed by adolescents regardless of context, and represent a good target for interventions to reduce non-core food consumption.
Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-core Processors

DTIC Science & Technology

2009-09-01

TFLOPS of Playstation 3 (PS3) nodes with IBM Cell Broadband Engine multi-cores and 15 dual-quad Xeon head nodes. The interconnect fabric includes... 4 3. INFORMATION MANAGEMENT FOR PARALLELIZATION AND...STREAMING............................................................. 7 4 . RESULTS
Production Level CFD Code Acceleration for Hybrid Many-Core Architectures

NASA Technical Reports Server (NTRS)

Duffy, Austen C.; Hammond, Dana P.; Nielsen, Eric J.

2012-01-01

In this work, a novel graphics processing unit (GPU) distributed sharing model for hybrid many-core architectures is introduced and employed in the acceleration of a production-level computational fluid dynamics (CFD) code. The latest generation graphics hardware allows multiple processor cores to simultaneously share a single GPU through concurrent kernel execution. This feature has allowed the NASA FUN3D code to be accelerated in parallel with up to four processor cores sharing a single GPU. For codes to scale and fully use resources on these and the next generation machines, codes will need to employ some type of GPU sharing model, as presented in this work. Findings include the effects of GPU sharing on overall performance. A discussion of the inherent challenges that parallel unstructured CFD codes face in accelerator-based computing environments is included, with considerations for future generation architectures. This work was completed by the author in August 2010, and reflects the analysis and results of the time.
New Dimensions in Microarchitecture Harnessing 3D Integration Technologies (BRIEFING CHARTS)

DTIC Science & Technology

2007-03-06

Quad Core Bandwidth and Latency Boundaries General Purpose Processor Loads Latency limited Ba nd w id th li m ite dProcessor load trade -off between I...delay No= number of ckts at 1V do= ckt delay at 1V From “3D Intergration ” Special Topic Sessionl W. Haensch, ISSCC ‘07, 2/07 11 DARPA MTS March 6, 2007
Clover: Compiler directed lightweight soft error resilience

DOE PAGES

Liu, Qingrui; Lee, Dongyoon; Jung, Changhee; ...

2015-05-01

This paper presents Clover, a compiler directed soft error detection and recovery scheme for lightweight soft error resilience. The compiler carefully generates soft error tolerant code based on idem-potent processing without explicit checkpoint. During program execution, Clover relies on a small number of acoustic wave detectors deployed in the processor to identify soft errors by sensing the wave made by a particle strike. To cope with DUE (detected unrecoverable errors) caused by the sensing latency of error detection, Clover leverages a novel selective instruction duplication technique called tail-DMR (dual modular redundancy). Once a soft error is detected by either themore » sensor or the tail-DMR, Clover takes care of the error as in the case of exception handling. To recover from the error, Clover simply redirects program control to the beginning of the code region where the error is detected. Lastly, the experiment results demonstrate that the average runtime overhead is only 26%, which is a 75% reduction compared to that of the state-of-the-art soft error resilience technique.« less
Reconfigurable Very Long Instruction Word (VLIW) Processor

NASA Technical Reports Server (NTRS)

Velev, Miroslav N.

2015-01-01

Future NASA missions will depend on radiation-hardened, power-efficient processing systems-on-a-chip (SOCs) that consist of a range of processor cores custom tailored for space applications. Aries Design Automation, LLC, has developed a processing SOC that is optimized for software-defined radio (SDR) uses. The innovation implements the Institute of Electrical and Electronics Engineers (IEEE) RazorII voltage management technique, a microarchitectural mechanism that allows processor cores to self-monitor, self-analyze, and selfheal after timing errors, regardless of their cause (e.g., radiation; chip aging; variations in the voltage, frequency, temperature, or manufacturing process). This highly automated SOC can also execute legacy PowerPC 750 binary code instruction set architecture (ISA), which is used in the flight-control computers of many previous NASA space missions. In developing this innovation, Aries Design Automation has made significant contributions to the fields of formal verification of complex pipelined microprocessors and Boolean satisfiability (SAT) and has developed highly efficient electronic design automation tools that hold promise for future developments.
Importance of core electrostatic properties on the electrophoresis of a soft particle

NASA Astrophysics Data System (ADS)

De, Simanta; Bhattacharyya, Somnath; Gopmandal, Partha P.

2016-08-01

The impact of the volumetric charged density of the dielectric rigid core on the electrophoresis of a soft particle is analyzed numerically. The volume charge density of the inner core of a soft particle can arise for a dendrimer structure or bacteriophage MS2. We consider the electrokinetic model based on the conservation principles, thus no conditions for Debye length or applied electric field is imposed. The fluid flow equations are coupled with the ion transport equations and the equation for the electric field. The occurrence of the induced nonuniform surface charge density on the outer surface of the inner core leads to a situation different from the existing analysis of a soft particle electrophoresis. The impact of this induced surface charge density together with the double-layer polarization and relaxation due to ion convection and electromigration is analyzed. The dielectric permittivity and the charge density of the core have a significant impact on the particle electrophoresis when the Debye length is in the order of the particle size. We find that by varying the ionic concentration of the electrolyte, the particle can exhibit reversal in its electrophoretic velocity. The role of the polymer layer softness parameter is addressed in the present analysis.
Frequency Dependence of Single-event Upset in Advanced Commerical PowerPC Microprocessors

NASA Technical Reports Server (NTRS)

Irom, Frokh; Farmanesh, Farhad F.; Swift, Gary M.; Johnston, Allen H.

2004-01-01

This paper examines single-event upsets in advanced commercial SOI microprocessors in a dynamic mode, studying SEU sensitivity of General Purpose Registers (GPRs) with clock frequency. Results are presented for SOI processors with feature sizes of 0.18 microns and two different core voltages. Single-event upset from heavy ions is measured for advanced commercial microprocessors in a dynamic mode with clock frequency up to 1GHz. Frequency and core voltage dependence of single-event upsets in registers is discussed.
Low-energy d-d excitations in MnO studied by resonant x-ray fluorescence spectroscopy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Butorin, S.M.; Guo, J.; Magnuson, M.

1997-04-01

Resonant soft X-ray emission spectroscopy has been demonstrated to possess interesting abilities for studies of electronic structure in various systems, such as symmetry probing, alignment and polarization dependence, sensitivity to channel interference, etc. In the present abstract the authors focus on the feasibility of resonant soft X-ray emission to probe low energy excitations by means of resonant electronic X-ray Raman scattering. Resonant X-ray emission can be regarded as an inelastic scattering process where a system in the ground state is transferred to a low excited state via a virtual core excitation. The energy closeness to a core excitation of themore » exciting radiation enhances the (generally) low probability for inelastic scattering at these wavelengths. Therefore soft X-ray emission spectroscopy (in resonant electronic Raman mode) can be used to study low energy d-d excitations in transition metal systems. The involvement of the intermediate core state allows one to use the selection rules of X-ray emission, and the appearance of the elastically scattered line in the spectra provides the reference to the ground state.« less
Noise transmission by viscoelastic sandwich panels

NASA Technical Reports Server (NTRS)

Vaicaitis, R.

1977-01-01

An analytical study on low frequency noise transmission into rectangular enclosures by viscoelastic sandwich panels is presented. Soft compressible cores with dilatational modes and hard incompressible cores with dilatational modes neglected are considered as limiting cases of core stiffness. It is reported that these panels can effect significant noise reduction.
Theorem Proving in Intel Hardware Design

NASA Technical Reports Server (NTRS)

O'Leary, John

2009-01-01

For the past decade, a framework combining model checking (symbolic trajectory evaluation) and higher-order logic theorem proving has been in production use at Intel. Our tools and methodology have been used to formally verify execution cluster functionality (including floating-point operations) for a number of Intel products, including the Pentium(Registered TradeMark)4 and Core(TradeMark)i7 processors. Hardware verification in 2009 is much more challenging than it was in 1999 - today s CPU chip designs contain many processor cores and significant firmware content. This talk will attempt to distill the lessons learned over the past ten years, discuss how they apply to today s problems, outline some future directions.
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
GPU: the biggest key processor for AI and parallel processing

NASA Astrophysics Data System (ADS)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
In-flight performance of pulse-processing system of the ASTRO-H/Hitomi soft x-ray spectrometer

NASA Astrophysics Data System (ADS)

Ishisaki, Yoshitaka; Yamada, Shinya; Seta, Hiromi; Tashiro, Makoto S.; Takeda, Sawako; Terada, Yukikatsu; Kato, Yuka; Tsujimoto, Masahiro; Koyama, Shu; Mitsuda, Kazuhisa; Sawada, Makoto; Boyce, Kevin R.; Chiao, Meng P.; Watanabe, Tomomi; Leutenegger, Maurice A.; Eckart, Megan E.; Porter, Frederick Scott; Kilbourne, Caroline Anne

2018-01-01

We summarize results of the initial in-orbit performance of the pulse shape processor (PSP) of the soft x-ray spectrometer instrument onboard ASTRO-H (Hitomi). Event formats, kind of telemetry, and the pulse-processing parameters are described, and the parameter settings in orbit are listed. The PSP was powered-on 2 days after launch, and the event threshold was lowered in orbit. The PSP worked fine in orbit, and there was neither memory error nor SpaceWire communication error until the break-up of spacecraft. Time assignment, electrical crosstalk, and the event screening criteria are studied. It is confirmed that the event processing rate at 100% central processing unit load is ˜200 c / s / array, compliant with the requirement on the PSP.
Single Event Effects (SEE) Testing of Embedded DSP Cores within Microsemi RTAX4000D Field Programmable Gate Array (FPGA) Devices

NASA Technical Reports Server (NTRS)

Perez, Christopher E.; Berg, Melanie D.; Friendlich, Mark R.

2011-01-01

Motivation for this work is: (1) Accurately characterize digital signal processor (DSP) core single-event effect (SEE) behavior (2) Test DSP cores across a large frequency range and across various input conditions (3) Isolate SEE analysis to DSP cores alone (4) Interpret SEE analysis in terms of single-event upsets (SEUs) and single-event transients (SETs) (5) Provide flight missions with accurate estimate of DSP core error rates and error signatures.
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks < number of processors) or tasks are divided out among the available processors (number of tasks > number of processors). Nested parallel statements may further subdivide the processor set owned by a given task. Tasks or processors are distributed evenly by default, but uneven distributions are possible under programmer control. It is also possible to explicitly enable child tasks to migrate within the processor set owned by their parent task, reducing load unbalancing at the potential cost of increased inter-processor message traffic. PM incorporates some programming structures from the earlier MIST language presented at a previous EGU General Assembly, while adopting a significantly different underlying parallelisation model and type system. PM code is available at www.pm-lang.org under an unrestrictive MIT license. Reference Ruymán Reyes, Antonio J. Dorta, Francisco Almeida, Francisco de Sande, 2009. Automatic Hybrid MPI+OpenMP Code Generation with llc, Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science Volume 5759, 185-195
Hydrocarbon Degradation in Caspian Sea Sediment Cores Subjected to Simulated Petroleum Seepage in a Newly Designed Sediment-Oil-Flow-Through System.

PubMed

Mishra, Sonakshi; Wefers, Peggy; Schmidt, Mark; Knittel, Katrin; Krüger, Martin; Stagars, Marion H; Treude, Tina

2017-01-01

The microbial community response to petroleum seepage was investigated in a whole round sediment core (16 cm length) collected nearby natural hydrocarbon seepage structures in the Caspian Sea, using a newly developed Sediment-Oil-Flow-Through (SOFT) system. Distinct redox zones established and migrated vertically in the core during the 190 days-long simulated petroleum seepage. Methanogenic petroleum degradation was indicated by an increase in methane concentration from 8 μM in an untreated core compared to 2300 μM in the lower sulfate-free zone of the SOFT core at the end of the experiment, accompanied by a respective decrease in the δ 13 C signal of methane from -33.7 to -49.5‰. The involvement of methanogens in petroleum degradation was further confirmed by methane production in enrichment cultures from SOFT sediment after the addition of hexadecane, methylnapthalene, toluene, and ethylbenzene. Petroleum degradation coupled to sulfate reduction was indicated by the increase of integrated sulfate reduction rates from 2.8 SO 4 2- m -2 day -1 in untreated cores to 5.7 mmol SO 4 2- m -2 day -1 in the SOFT core at the end of the experiment, accompanied by a respective accumulation of sulfide from 30 to 447 μM. Volatile hydrocarbons (C2-C6 n -alkanes) passed through the methanogenic zone mostly unchanged and were depleted within the sulfate-reducing zone. The amount of heavier n -alkanes (C10-C38) decreased step-wise toward the top of the sediment core and a preferential degradation of shorter (C30) was seen during the seepage. This study illustrates, to the best of our knowledge, for the first time the development of methanogenic petroleum degradation and the succession of benthic microbial processes during petroleum passage in a whole round sediment core.
Hydrocarbon Degradation in Caspian Sea Sediment Cores Subjected to Simulated Petroleum Seepage in a Newly Designed Sediment-Oil-Flow-Through System

PubMed Central

Mishra, Sonakshi; Wefers, Peggy; Schmidt, Mark; Knittel, Katrin; Krüger, Martin; Stagars, Marion H.; Treude, Tina

2017-01-01

The microbial community response to petroleum seepage was investigated in a whole round sediment core (16 cm length) collected nearby natural hydrocarbon seepage structures in the Caspian Sea, using a newly developed Sediment-Oil-Flow-Through (SOFT) system. Distinct redox zones established and migrated vertically in the core during the 190 days-long simulated petroleum seepage. Methanogenic petroleum degradation was indicated by an increase in methane concentration from 8 μM in an untreated core compared to 2300 μM in the lower sulfate-free zone of the SOFT core at the end of the experiment, accompanied by a respective decrease in the δ13C signal of methane from -33.7 to -49.5‰. The involvement of methanogens in petroleum degradation was further confirmed by methane production in enrichment cultures from SOFT sediment after the addition of hexadecane, methylnapthalene, toluene, and ethylbenzene. Petroleum degradation coupled to sulfate reduction was indicated by the increase of integrated sulfate reduction rates from 2.8 SO42-m-2 day-1 in untreated cores to 5.7 mmol SO42-m-2 day-1 in the SOFT core at the end of the experiment, accompanied by a respective accumulation of sulfide from 30 to 447 μM. Volatile hydrocarbons (C2–C6 n-alkanes) passed through the methanogenic zone mostly unchanged and were depleted within the sulfate-reducing zone. The amount of heavier n-alkanes (C10–C38) decreased step-wise toward the top of the sediment core and a preferential degradation of shorter (C30) was seen during the seepage. This study illustrates, to the best of our knowledge, for the first time the development of methanogenic petroleum degradation and the succession of benthic microbial processes during petroleum passage in a whole round sediment core. PMID:28503172

Case Study of Using High Performance Commercial Processors in Space

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.; Olivas, Zulema

2009-01-01

The purpose of the Space Shuttle Cockpit Avionics Upgrade project (1999 2004) was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the re-evaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s were radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but had some ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.
Case Study of Using High Performance Commercial Processors in a Space Environment

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.; Olivas, Zulema

2009-01-01

The purpose of the Space Shuttle Cockpit Avionics Upgrade project was to reduce crew workload and improve situational awareness. The upgrade was to augment the Shuttle avionics system with new hardware and software. A major success of this project was the validation of the hardware architecture and software design. This was significant because the project incorporated new technology and approaches for the development of human rated space software. An early version of this system was tested at the Johnson Space Center for one month by teams of astronauts. The results were positive, but NASA eventually cancelled the project towards the end of the development cycle. The goal to reduce crew workload and improve situational awareness resulted in the need for high performance Central Processing Units (CPUs). The choice of CPU selected was the PowerPC family, which is a reduced instruction set computer (RISC) known for its high performance. However, the requirement for radiation tolerance resulted in the reevaluation of the selected family member of the PowerPC line. Radiation testing revealed that the original selected processor (PowerPC 7400) was too soft to meet mission objectives and an effort was established to perform trade studies and performance testing to determine a feasible candidate. At that time, the PowerPC RAD750s where radiation tolerant, but did not meet the required performance needs of the project. Thus, the final solution was to select the PowerPC 7455. This processor did not have a radiation tolerant version, but faired better than the 7400 in the ability to detect failures. However, its cache tags did not provide parity and thus the project incorporated a software strategy to detect radiation failures. The strategy was to incorporate dual paths for software generating commands to the legacy Space Shuttle avionics to prevent failures due to the softness of the upgraded avionics.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

NASA Astrophysics Data System (ADS)

Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

2014-03-01

List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

PubMed Central

Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.

2011-01-01

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Using Microcomputer Word Processors for Foreign Languages.

ERIC Educational Resources Information Center

Smith, Kim L.

1984-01-01

Describes the programs and modifications needed to do word processing using foreign language characters. One such program, Screenwriter, uses soft character sets -- character sets which can be designed by the program user. This program has a word processing power combined with a foreign language capability that would allow any person to work with…
Parallel Agent-Based Simulations on Clusters of GPUs and Multi-Core Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaby, Brandon G; Perumalla, Kalyan S; Seal, Sudip K

2010-01-01

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Messagemore » Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.« less
Single-Event Upset and Scaling Trends in New Generation of the Commercial SOI PowerPC Microprocessors

NASA Technical Reports Server (NTRS)

Irom, Farokh; Farmanesh, Farhad; Kouba, Coy K.

2006-01-01

SEU from heavy-ions is measured for SOI PowerPC microprocessors. Results for 0.13 micron PowerPC with 1.1V core voltages increases over 1.3V versions. This suggests that improvement in SEU for scaled devices may be reversed. In recent years there has been interest in the possible use of unhardened commercial microprocessors in space because of their superior performance compared to hardened processors. However, unhardened devices are susceptible to upset from radiation space. More information is needed on how they respond to radiation before they can be used in space. Only a limited number of advanced microprocessors have been subjected to radiation tests, which are designed with lower clock frequencies and higher internal core voltage voltages than recent devices [1-6]. However the trend for commercial Silicon-on-insulator (SOI) microprocessors is to reduce feature size and internal core voltage and increase the clock frequency. Commercial microprocessors with the PowerPC architecture are now available that use partially depleted SOI processes with feature size of 90 nm and internal core voltage as low as 1.0 V and clock frequency in the GHz range. Previously, we reported SEU measurements for SOI commercial PowerPCs with feature size of 0.18 and 0.13 m [7, 8]. The results showed an order of magnitude reduction in saturated cross section compared to CMOS bulk counterparts. This paper examines SEUs in advanced commercial SOI microprocessors, focusing on SEU sensitivity of D-Cache and hangs with feature size and internal core voltage. Results are presented for the Motorola SOI processor with feature sizes of 0.13 microns and internal core voltages of 1.3 and 1.1 V. These results are compared with results for the Motorola SOI processors with feature size of 0.18 microns and internal core voltage of 1.6 and 1.3 V.
VENTURE/PC manual: A multidimensional multigroup neutron diffusion code system. Version 3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shapiro, A.; Huria, H.C.; Cho, K.W.

1991-12-01

VENTURE/PC is a recompilation of part of the Oak Ridge BOLD VENTURE code system, which will operate on an IBM PC or compatible computer. Neutron diffusion theory solutions are obtained for multidimensional, multigroup problems. This manual contains information associated with operating the code system. The purpose of the various modules used in the code system, and the input for these modules are discussed. The PC code structure is also given. Version 2 included several enhancements not given in the original version of the code. In particular, flux iterations can be done in core rather than by reading and writing tomore » disk, for problems which allow sufficient memory for such in-core iterations. This speeds up the iteration process. Version 3 does not include any of the special processors used in the previous versions. These special processors utilized formatted input for various elements of the code system. All such input data is now entered through the Input Processor, which produces standard interface files for the various modules in the code system. In addition, a Standard Interface File Handbook is included in the documentation which is distributed with the code, to assist in developing the input for the Input Processor.« less
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.

PubMed

Zhang, Zhen; Ma, Cheng; Zhu, Rong

2017-08-23

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System

PubMed Central

Zhang, Zhen; Zhu, Rong

2017-01-01

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas. PMID:28832522
Multitasking 3-D forward modeling using high-order finite difference methods on the Cray X-MP/416

DOE Office of Scientific and Technical Information (OSTI.GOV)

Terki-Hassaine, O.; Leiss, E.L.

1988-01-01

The CRAY X-MP/416 was used to multitask 3-D forward modeling by the high-order finite difference method. Flowtrace analysis reveals that the most expensive operation in the unitasked program is a matrix vector multiplication. The in-core and out-of-core versions of a reentrant subroutine can perform any fraction of the matrix vector multiplication independently, a pattern compatible with multitasking. The matrix vector multiplication routine can be distributed over two to four processors. The rest of the program utilizes the microtasking feature that lets the system treat independent iterations of DO-loops as subtasks to be performed by any available processor. The availability ofmore » the Solid-State Storage Device (SSD) meant the I/O wait time was virtually zero. A performance study determined a theoretical speedup, taking into account the multitasking overhead. Multitasking programs utilizing both macrotasking and microtasking features obtained actual speedups that were approximately 80% of the ideal speedup.« less
Phantom-GRAPE: Numerical software library to accelerate collisionless N-body simulation with SIMD instruction set on x86 architecture

NASA Astrophysics Data System (ADS)

Tanikawa, Ataru; Yoshikawa, Kohji; Nitadori, Keigo; Okamoto, Takashi

2013-02-01

We have developed a numerical software library for collisionless N-body simulations named "Phantom-GRAPE" which highly accelerates force calculations among particles by use of a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). In our library, not only the Newton's forces, but also central forces with an arbitrary shape f(r), which has a finite cutoff radius rcut (i.e. f(r)=0 at r>rcut), can be quickly computed. In computing such central forces with an arbitrary force shape f(r), we refer to a pre-calculated look-up table. We also present a new scheme to create the look-up table whose binning is optimal to keep good accuracy in computing forces and whose size is small enough to avoid cache misses. Using an Intel Core i7-2600 processor, we measure the performance of our library for both of the Newton's forces and the arbitrarily shaped central forces. In the case of Newton's forces, we achieve 2×109 interactions per second with one processor core (or 75 GFLOPS if we count 38 operations per interaction), which is 20 times higher than the performance of an implementation without any explicit use of SIMD instructions, and 2 times than that with the SSE instructions. With four processor cores, we obtain the performance of 8×109 interactions per second (or 300 GFLOPS). In the case of the arbitrarily shaped central forces, we can calculate 1×109 and 4×109 interactions per second with one and four processor cores, respectively. The performance with one processor core is 6 times and 2 times higher than those of the implementations without any use of SIMD instructions and with the SSE instructions. These performances depend only weakly on the number of particles, irrespective of the force shape. It is good contrast with the fact that the performance of force calculations accelerated by graphics processing units (GPUs) depends strongly on the number of particles. Substantially weak dependence of the performance on the number of particles is suitable to collisionless N-body simulations, since these simulations are usually performed with sophisticated N-body solvers such as Tree- and TreePM-methods combined with an individual timestep scheme. We conclude that collisionless N-body simulations accelerated with our library have significant advantage over those accelerated by GPUs, especially on massively parallel environments.
The Transition to a Many-core World

NASA Astrophysics Data System (ADS)

Mattson, T. G.

2012-12-01

The need to increase performance within a fixed energy budget has pushed the computer industry to many core processors. This is grounded in the physics of computing and is not a trend that will just go away. It is hard to overestimate the profound impact of many-core processors on software developers. Virtually every facet of the software development process will need to change to adapt to these new processors. In this talk, we will look at many-core hardware and consider its evolution from a perspective grounded in the CPU. We will show that the number of cores will inevitably increase, but in addition, a quest to maximize performance per watt will push these cores to be heterogeneous. We will show that the inevitable result of these changes is a computing landscape where the distinction between the CPU and the GPU is blurred. We will then consider the much more pressing problem of software in a many core world. Writing software for heterogeneous many core processors is well beyond the ability of current programmers. One solution is to support a software development process where programmer teams are split into two distinct groups: a large group of domain-expert productivity programmers and much smaller team of computer-scientist efficiency programmers. The productivity programmers work in terms of high level frameworks to express the concurrency in their problems while avoiding any details for how that concurrency is exploited. The second group, the efficiency programmers, map applications expressed in terms of these frameworks onto the target many-core system. In other words, we can solve the many-core software problem by creating a software infrastructure that only requires a small subset of programmers to become master parallel programmers. This is different from the discredited dream of automatic parallelism. Note that productivity programmers still need to define the architecture of their software in a way that exposes the concurrency inherent in their problem. We submit that domain-expert programmers understand "what is concurrent". The parallel programming problem emerges from the complexity of "how that concurrency is utilized" on real hardware. The research described in this talk was carried out in collaboration with the ParLab at UC Berkeley. We use a design pattern language to define the high level frameworks exposed to domain-expert, productivity programmers. We then use tools from the SEJITS project (Selective embedded Just In time Specializers) to build the software transformation tool chains thst turn these framework-oriented designs into highly efficient code. The final ingredient is a software platform to serve as a target for these tools. One such platform is the OpenCL industry standard for programming heterogeneous systems. We will briefly describe OpenCL and show how it provides a vendor-neutral software target for current and future many core systems; both CPU-based, GPU-based, and heterogeneous combinations of the two.
Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

NASA Astrophysics Data System (ADS)

Olson, Richard F.

2013-05-01

Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
N-body simulation for self-gravitating collisional systems with a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions

NASA Astrophysics Data System (ADS)

Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo

2012-02-01

We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
GR712RC- Dual-Core Processor- Product Status

NASA Astrophysics Data System (ADS)

Sturesson, Fredrik; Habinc, Sandi; Gaisler, Jiri

2012-08-01

The GR712RC System-on-Chip (SoC) is a dual core LEON3FT system suitable for advanced high reliability space avionics. Fault tolerance features from Aeroflex Gaisler’s GRLIB IP library and an implementation using Ramon Chips RadSafe cell library enables superior radiation hardness.The GR712RC device has been designed to provide high processing power by including two LEON3FT 32- bit SPARC V8 processors, each with its own high- performance IEEE754 compliant floating-point-unit and SPARC reference memory management unit.This high processing power is combined with a large number of serial interfaces, ranging from high-speed links for data transfers to low-speed control buses for commanding and status acquisition.
Modeling Large Scale Circuits Using Massively Parallel Descrete-Event Simulation

DTIC Science & Technology

2013-06-01

exascale levels of performance, the smallest elements of a single processor can greatly affect the entire computer system (e.g. its power consumption...grow to exascale levels of performance, the smallest elements of a single processor can greatly affect the entire computer system (e.g. its power...Warp Speed 10.0. 2.0 INTRODUCTION As supercomputer systems approach exascale , the core count will exceed 1024 and number of transistors used in
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wickstrom, Gregory Lloyd; Gale, Jason Carl; Ma, Kwok Kee

The Sandia Secure Processor (SSP) is a new native Java processor that has been specifically designed for embedded applications. The SSP's design is a system composed of a core Java processor that directly executes Java bytecodes, on-chip intelligent IO modules, and a suite of software tools for simulation and compiling executable binary files. The SSP is unique in that it provides a way to control real-time IO modules for embedded applications. The system software for the SSP is a 'class loader' that takes Java .class files (created with your favorite Java compiler), links them together, and compiles a binary. Themore » complete SSP system provides very powerful functionality with very light hardware requirements with the potential to be used in a wide variety of small-system embedded applications. This paper gives a detail description of the Sandia Secure Processor and its unique features.« less
Design and realization of the real-time spectrograph controller for LAMOST based on FPGA

NASA Astrophysics Data System (ADS)

Wang, Jianing; Wu, Liyan; Zeng, Yizhong; Dai, Songxin; Hu, Zhongwen; Zhu, Yongtian; Wang, Lei; Wu, Zhen; Chen, Yi

2008-08-01

A large Schmitt reflector telescope, Large Sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST), is being built in China, which has effective aperture of 4 meters and can observe the spectra of as many as 4000 objects simultaneously. To fit such a large amount of observational objects, the dispersion part is composed of a set of 16 multipurpose fiber-fed double-beam Schmidt spectrographs, of which each has about ten of moveable components realtimely accommodated and manipulated by a controller. An industrial Ethernet network connects those 16 spectrograph controllers. The light from stars is fed to the entrance slits of the spectrographs with optical fibers. In this paper, we mainly introduce the design and realization of our real-time controller for the spectrograph, our design using the technique of System On Programmable Chip (SOPC) based on Field Programmable Gate Array (FPGA) and then realizing the control of the spectrographs through NIOSII Soft Core Embedded Processor. We seal the stepper motor controller as intellectual property (IP) cores and reuse it, greatly simplifying the design process and then shortening the development time. Under the embedded operating system μC/OS-II, a multi-tasks control program has been well written to realize the real-time control of the moveable parts of the spectrographs. At present, a number of such controllers have been applied in the spectrograph of LAMOST.
Soft ferrite cores characterization for integrated micro-inductors

NASA Astrophysics Data System (ADS)

Nguyen, Yen Mai; Lopez, Thomas; Laur, Jean-Pierre; Bourrier, David; Charlot, Samuel; Valdez-Nava, Zarel; Bley, Vincent; Combettes, Céline; Brunet, Magali

2013-12-01

Ferrite-based micro-inductors are proposed for hybrid integration on silicon for low-power medium frequency DC-DC converters. Due to their small coercive field and their high resistivity, soft ferrites are good candidates for a magnetic core working at moderate frequencies in the range of 5-10 MHz. We have studied several soft ferrites including commercial ferrite film and U70 and U200 homemade ferrites. The inductors are fabricated at wafer level using micromachining and assembling techniques. The proposed process is based on a sintered ferrite core placed in between thick electroplated copper windings. The low profile ferrite cores of 1.2 × 2.6 × 0.2 mm3 are produced by two methods from green tape-casted films and ferrite powder. This paper presents the magnetic characterization of the sintered ferrite films cut and printed in rectangular shape and sintered at different temperatures. The comparison is made in order to find out the best material for the core that can reach the required inductance (470 nH at 6 MHz) under 0.6A current DC bias and that generate the smallest losses. An inductance density of 285 nH/ mm2 up to 6 MHz was obtained for ESL 40011 cores that is much higher than the previously reported devices. The small size of our devices is also a prominent point.

Geolocating thermal binoculars based on a software defined camera core incorporating HOT MCT grown by MOVPE

NASA Astrophysics Data System (ADS)

Pillans, Luke; Harmer, Jack; Edwards, Tim; Richardson, Lee

2016-05-01

Geolocation is the process of calculating a target position based on bearing and range relative to the known location of the observer. A high performance thermal imager with integrated geolocation functions is a powerful long range targeting device. Firefly is a software defined camera core incorporating a system-on-a-chip processor running the AndroidTM operating system. The processor has a range of industry standard serial interfaces which were used to interface to peripheral devices including a laser rangefinder and a digital magnetic compass. The core has built in Global Positioning System (GPS) which provides the third variable required for geolocation. The graphical capability of Firefly allowed flexibility in the design of the man-machine interface (MMI), so the finished system can give access to extensive functionality without appearing cumbersome or over-complicated to the user. This paper covers both the hardware and software design of the system, including how the camera core influenced the selection of peripheral hardware, and the MMI design process which incorporated user feedback at various stages.
Fast multipurpose Monte Carlo simulation for proton therapy using multi- and many-core CPU architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Souris, Kevin, E-mail: kevin.souris@uclouvain.be; Lee, John Aldo; Sterpin, Edmond

2016-04-15

Purpose: Accuracy in proton therapy treatment planning can be improved using Monte Carlo (MC) simulations. However the long computation time of such methods hinders their use in clinical routine. This work aims to develop a fast multipurpose Monte Carlo simulation tool for proton therapy using massively parallel central processing unit (CPU) architectures. Methods: A new Monte Carlo, called MCsquare (many-core Monte Carlo), has been designed and optimized for the last generation of Intel Xeon processors and Intel Xeon Phi coprocessors. These massively parallel architectures offer the flexibility and the computational power suitable to MC methods. The class-II condensed history algorithmmore » of MCsquare provides a fast and yet accurate method of simulating heavy charged particles such as protons, deuterons, and alphas inside voxelized geometries. Hard ionizations, with energy losses above a user-specified threshold, are simulated individually while soft events are regrouped in a multiple scattering theory. Elastic and inelastic nuclear interactions are sampled from ICRU 63 differential cross sections, thereby allowing for the computation of prompt gamma emission profiles. MCsquare has been benchmarked with the GATE/GEANT4 Monte Carlo application for homogeneous and heterogeneous geometries. Results: Comparisons with GATE/GEANT4 for various geometries show deviations within 2%–1 mm. In spite of the limited memory bandwidth of the coprocessor simulation time is below 25 s for 10{sup 7} primary 200 MeV protons in average soft tissues using all Xeon Phi and CPU resources embedded in a single desktop unit. Conclusions: MCsquare exploits the flexibility of CPU architectures to provide a multipurpose MC simulation tool. Optimized code enables the use of accurate MC calculation within a reasonable computation time, adequate for clinical practice. MCsquare also simulates prompt gamma emission and can thus be used also for in vivo range verification.« less
Rubus: A compiler for seamless and extensible parallelism.

PubMed

Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program.
Rubus: A compiler for seamless and extensible parallelism

PubMed Central

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been achieved by Rubus on the same GPU. Moreover, Rubus achieves this performance without drastically increasing the memory footprint of a program. PMID:29211758
Shell-corona microgels from double interpenetrating networks.

PubMed

Rudyak, Vladimir Yu; Gavrilov, Alexey A; Kozhunova, Elena Yu; Chertovich, Alexander V

2018-04-18

Polymer microgels with a dense outer shell offer outstanding features as universal carriers for different guest molecules. In this paper, microgels formed by an interpenetrating network comprised of collapsed and swollen subnetworks are investigated using dissipative particle dynamics (DPD) computer simulations, and it is found that such systems can form classical core-corona structures, shell-corona structures, and core-shell-corona structures, depending on the subchain length and molecular mass of the system. The core-corona structures consisting of a dense core and soft corona are formed at small microgel sizes when the subnetworks are able to effectively separate in space. The most interesting shell-corona structures consist of a soft cavity in a dense shell surrounded with a loose corona, and are found at intermediate gel sizes; the area of their existence depends on the subchain length and the corresponding mesh size. At larger molecular masses the collapsing network forms additional cores inside the soft cavity, leading to the core-shell-corona structure.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

PubMed

Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

2010-01-11

Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
A Versatile Multichannel Digital Signal Processing Module for Microcalorimeter Arrays

NASA Astrophysics Data System (ADS)

Tan, H.; Collins, J. W.; Walby, M.; Hennig, W.; Warburton, W. K.; Grudberg, P.

2012-06-01

Different techniques have been developed for reading out microcalorimeter sensor arrays: individual outputs for small arrays, and time-division or frequency-division or code-division multiplexing for large arrays. Typically, raw waveform data are first read out from the arrays using one of these techniques and then stored on computer hard drives for offline optimum filtering, leading not only to requirements for large storage space but also limitations on achievable count rate. Thus, a read-out module that is capable of processing microcalorimeter signals in real time will be highly desirable. We have developed multichannel digital signal processing electronics that are capable of on-board, real time processing of microcalorimeter sensor signals from multiplexed or individual pixel arrays. It is a 3U PXI module consisting of a standardized core processor board and a set of daughter boards. Each daughter board is designed to interface a specific type of microcalorimeter array to the core processor. The combination of the standardized core plus this set of easily designed and modified daughter boards results in a versatile data acquisition module that not only can easily expand to future detector systems, but is also low cost. In this paper, we first present the core processor/daughter board architecture, and then report the performance of an 8-channel daughter board, which digitizes individual pixel outputs at 1 MSPS with 16-bit precision. We will also introduce a time-division multiplexing type daughter board, which takes in time-division multiplexing signals through fiber-optic cables and then processes the digital signals to generate energy spectra in real time.
Power losses of soft magnetic composite materials under two-dimensional excitation

NASA Astrophysics Data System (ADS)

Zhu, J. G.; Zhong, J. J.; Ramsden, V. S.; Guo, Y. G.

1999-04-01

Soft magnetic composite materials produced by powder metallurgy techniques can be very useful for construction of low cost small motors. However, the rotational core losses and the corresponding B-H relationships of soft magnetic composite materials with two-dimensional rotating fluxes have neither been supplied by the manufacturers nor reported in the literature. This article reports the core loss measurement of a soft magnetic composite material, SOMALOY™ 500, Höganäs AB, Sweden, under two-dimensional excitations. The principle of measurement, testing system, and power loss calculation are presented. The results are analyzed and discussed.
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
High Speed White Dwarf Asteroseismology with the Herty Hall Cluster

NASA Astrophysics Data System (ADS)

Gray, Aaron; Kim, A.

2012-01-01

Asteroseismology is the process of using observed oscillations of stars to infer their interior structure. In high speed asteroseismology, we complete that by quickly computing hundreds of thousands of models to match the observed period spectra. Each model on a single processor takes five to ten seconds to run. Therefore, we use a cluster of sixteen Dell Workstations with dual-core processors. The computers use the Ubuntu operating system and Apache Hadoop software to manage workloads.
Thermal Hotspots in CPU Die and It's Future Architecture

NASA Astrophysics Data System (ADS)

Wang, Jian; Hu, Fu-Yuan

Owing to the increasing core frequency and chip integration and the limited die dimension, the power densities in CPU chip have been increasing fastly. The high temperature on chip resulted by power densities threats the processor's performance and chip's reliability. This paper analyzed the thermal hotspots in die and their properties. A new architecture of function units in die - - hot units distributed architecture is suggested to cope with the problems of high power densities for future processor chip.
FAST: framework for heterogeneous medical image computing and visualization.

PubMed

Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank

2015-11-01

Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
A pipeline VLSI design of fast singular value decomposition processor for real-time EEG system based on on-line recursive independent component analysis.

PubMed

Huang, Kuan-Ju; Shih, Wei-Yeh; Chang, Jui Chung; Feng, Chih Wei; Fang, Wai-Chi

2013-01-01

This paper presents a pipeline VLSI design of fast singular value decomposition (SVD) processor for real-time electroencephalography (EEG) system based on on-line recursive independent component analysis (ORICA). Since SVD is used frequently in computations of the real-time EEG system, a low-latency and high-accuracy SVD processor is essential. During the EEG system process, the proposed SVD processor aims to solve the diagonal, inverse and inverse square root matrices of the target matrices in real time. Generally, SVD requires a huge amount of computation in hardware implementation. Therefore, this work proposes a novel design concept for data flow updating to assist the pipeline VLSI implementation. The SVD processor can greatly improve the feasibility of real-time EEG system applications such as brain computer interfaces (BCIs). The proposed architecture is implemented using TSMC 90 nm CMOS technology. The sample rate of EEG raw data adopts 128 Hz. The core size of the SVD processor is 580×580 um(2), and the speed of operation frequency is 20MHz. It consumes 0.774mW of power during the 8-channel EEG system per execution time.
Energy Efficient Image/Video Data Transmission on Commercial Multi-Core Processors

PubMed Central

Lee, Sungju; Kim, Heegon; Chung, Yongwha; Park, Daihee

2012-01-01

In transmitting image/video data over Video Sensor Networks (VSNs), energy consumption must be minimized while maintaining high image/video quality. Although image/video compression is well known for its efficiency and usefulness in VSNs, the excessive costs associated with encoding computation and complexity still hinder its adoption for practical use. However, it is anticipated that high-performance handheld multi-core devices will be used as VSN processing nodes in the near future. In this paper, we propose a way to improve the energy efficiency of image and video compression with multi-core processors while maintaining the image/video quality. We improve the compression efficiency at the algorithmic level or derive the optimal parameters for the combination of a machine and compression based on the tradeoff between the energy consumption and the image/video quality. Based on experimental results, we confirm that the proposed approach can improve the energy efficiency of the straightforward approach by a factor of 2∼5 without compromising image/video quality. PMID:23202181
High performance in silico virtual drug screening on many-core processors.

PubMed

McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-05-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
High performance in silico virtual drug screening on many-core processors

PubMed Central

Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-01-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
A Down-to-Earth Educational Operating System for Up-in-the-Cloud Many-Core Architectures

ERIC Educational Resources Information Center

Ziwisky, Michael; Persohn, Kyle; Brylow, Dennis

2013-01-01

We present "Xipx," the first port of a major educational operating system to a processor in the emerging class of many-core architectures. Through extensions to the proven Embedded Xinu operating system, Xipx gives students hands-on experience with system programming in a distributed message-passing environment. We expose the software primitives…
Cognitive Medical Wireless Testbed System (COMWITS)

DTIC Science & Technology

2016-11-01

Number: ...... ...... Sub Contractors (DD882) Names of other research staff Inventions (DD882) Scientific Progress This testbed merges two ARO grants...bit 64 bit CPU Intel Xeon Processor E5-1650v3 (6C, 3.5 GHz, Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon
One-pot synthesis of biocompatible Te@phenol formaldehyde resin core-shell nanowires with uniform size and unique fluorescent properties by a synergized soft-hard template process.

PubMed

Qian, Haisheng; Zhu, Enbo; Zheng, Shunji; Li, Zhengquan; Hu, Yong; Guo, Changfa; Yang, Xingyun; Li, Liangchao; Tong, Guoxiu; Guo, Huichen

2010-12-10

One-pot hydrothermal process has been developed to synthesize uniform Te@phenol formaldehyde resin core-shell nanowires with unique fluorescent properties. A synergistic soft-hard template mechanism has been proposed to explain the formation of the core-shell nanowires. The Te@phenol formaldehyde resin core-shell nanowires display unique fluorescent properties, which give strong luminescent emission in the blue-violet and green regions with excitation wavelengths of 270 nm and 402 nm, respectively.
Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.

PubMed

Hines, Michael; Kumar, Sameer; Schürmann, Felix

2011-01-01

For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect neural net architecture but be randomly distributed so that sets of cells which are burst firing together should be on different processors with their targets on as large a set of processors as possible.

Orthorectification by Using Gpgpu Method

NASA Astrophysics Data System (ADS)

Sahin, H.; Kulur, S.

2012-07-01

Thanks to the nature of the graphics processing, the newly released products offer highly parallel processing units with high-memory bandwidth and computational power of more than teraflops per second. The modern GPUs are not only powerful graphic engines but also they are high level parallel programmable processors with very fast computing capabilities and high-memory bandwidth speed compared to central processing units (CPU). Data-parallel computations can be shortly described as mapping data elements to parallel processing threads. The rapid development of GPUs programmability and capabilities attracted the attentions of researchers dealing with complex problems which need high level calculations. This interest has revealed the concepts of "General Purpose Computation on Graphics Processing Units (GPGPU)" and "stream processing". The graphic processors are powerful hardware which is really cheap and affordable. So the graphic processors became an alternative to computer processors. The graphic chips which were standard application hardware have been transformed into modern, powerful and programmable processors to meet the overall needs. Especially in recent years, the phenomenon of the usage of graphics processing units in general purpose computation has led the researchers and developers to this point. The biggest problem is that the graphics processing units use different programming models unlike current programming methods. Therefore, an efficient GPU programming requires re-coding of the current program algorithm by considering the limitations and the structure of the graphics hardware. Currently, multi-core processors can not be programmed by using traditional programming methods. Event procedure programming method can not be used for programming the multi-core processors. GPUs are especially effective in finding solution for repetition of the computing steps for many data elements when high accuracy is needed. Thus, it provides the computing process more quickly and accurately. Compared to the GPUs, CPUs which perform just one computing in a time according to the flow control are slower in performance. This structure can be evaluated for various applications of computer technology. In this study covers how general purpose parallel programming and computational power of the GPUs can be used in photogrammetric applications especially direct georeferencing. The direct georeferencing algorithm is coded by using GPGPU method and CUDA (Compute Unified Device Architecture) programming language. Results provided by this method were compared with the traditional CPU programming. In the other application the projective rectification is coded by using GPGPU method and CUDA programming language. Sample images of various sizes, as compared to the results of the program were evaluated. GPGPU method can be used especially in repetition of same computations on highly dense data, thus finding the solution quickly.
LDPC decoder with a limited-precision FPGA-based floating-point multiplication coprocessor

NASA Astrophysics Data System (ADS)

Moberly, Raymond; O'Sullivan, Michael; Waheed, Khurram

2007-09-01

Implementing the sum-product algorithm, in an FPGA with an embedded processor, invites us to consider a tradeoff between computational precision and computational speed. The algorithm, known outside of the signal processing community as Pearl's belief propagation, is used for iterative soft-decision decoding of LDPC codes. We determined the feasibility of a coprocessor that will perform product computations. Our FPGA-based coprocessor (design) performs computer algebra with significantly less precision than the standard (e.g. integer, floating-point) operations of general purpose processors. Using synthesis, targeting a 3,168 LUT Xilinx FPGA, we show that key components of a decoder are feasible and that the full single-precision decoder could be constructed using a larger part. Soft-decision decoding by the iterative belief propagation algorithm is impacted both positively and negatively by a reduction in the precision of the computation. Reducing precision reduces the coding gain, but the limited-precision computation can operate faster. A proposed solution offers custom logic to perform computations with less precision, yet uses the floating-point format to interface with the software. Simulation results show the achievable coding gain. Synthesis results help theorize the the full capacity and performance of an FPGA-based coprocessor.
Soft-Matter Printed Circuit Board with UV Laser Micropatterning.

PubMed

Lu, Tong; Markvicka, Eric J; Jin, Yichu; Majidi, Carmel

2017-07-05

When encapsulated in elastomer, micropatterned traces of Ga-based liquid metal (LM) can function as elastically deformable circuit wiring that provides mechanically robust electrical connectivity between solid-state elements (e.g., transistors, processors, and sensor nodes). However, LM-microelectronics integration is currently limited by challenges in rapid fabrication of LM circuits and the creation of vias between circuit terminals and the I/O pins of packaged electronics. In this study, we address both with a unique layup for soft-matter electronics in which traces of liquid-phase Ga-In eutectic (EGaIn) are patterned with UV laser micromachining (UVLM). The terminals of the elastomer-sealed LM circuit connect to the surface mounted chips through vertically aligned columns of EGaIn-coated Ag-Fe 2 O 3 microparticles that are embedded within an interfacial elastomer layer. The processing technique is compatible with conventional UVLM printed circuit board (PCB) prototyping and exploits the photophysical ablation of EGaIn on an elastomer substrate. Potential applications to wearable computing and biosensing are demonstrated with functional implementations in which soft-matter PCBs are populated with surface-mounted microelectronics.
Rare-Earth-Free Permanent Magnets for Electrical Vehicle Motors and Wind Turbine Generators: Hexagonal Symmetry Based Materials Systems Mn-Bi and M-type Hexaferrite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, Yang-Ki; Haskew, Timothy; Myryasov, Oleg

2014-06-05

The research we conducted focuses on the rare-earth (RE)-free permanent magnet by modeling, simulating, and synthesizing exchange coupled two-phase (hard/soft) RE-free core-shell nano-structured magnet. The RE-free magnets are made of magnetically hard core materials (high anisotropy materials including Mn-Bi-X and M-type hexaferrite) coated by soft shell materials (high magnetization materials including Fe-Co or Co). Therefore, our research helps understand the exchange coupling conditions of the core/shell magnets, interface exchange behavior between core and shell materials, formation mechanism of core/shell structures, stability conditions of core and shell materials, etc.
Self-powered information measuring wireless networks using the distribution of tasks within multicore processors

NASA Astrophysics Data System (ADS)

Zhuravska, Iryna M.; Koretska, Oleksandra O.; Musiyenko, Maksym P.; Surtel, Wojciech; Assembay, Azat; Kovalev, Vladimir; Tleshova, Akmaral

2017-08-01

The article contains basic approaches to develop the self-powered information measuring wireless networks (SPIM-WN) using the distribution of tasks within multicore processors critical applying based on the interaction of movable components - as in the direction of data transmission as wireless transfer of energy coming from polymetric sensors. Base mathematic model of scheduling tasks within multiprocessor systems was modernized to schedule and allocate tasks between cores of one-crystal computer (SoC) to increase energy efficiency SPIM-WN objects.
Upset Characterization of the PowerPC405 Hard-core Processor Embedded in Virtex-II Pro Field Programmable Gate Arrays

NASA Technical Reports Server (NTRS)

Swift, Gary M.; Allen, Gregory S.; Farmanesh, Farhad; George, Jeffrey; Petrick, David J.; Chayab, Fayez

2006-01-01

Shown in this presentation are recent results for the upset susceptibility of the various types of memory elements in the embedded PowerPC405 in the Xilinx V2P40 FPGA. For critical flight designs where configuration upsets are mitigated effectively through appropriate design triplication and configuration scrubbing, these upsets of processor elements can dominate the system error rate. Data from irradiations with both protons and heavy ions are given and compared using available models.
Importance of pH-regulated charge density on the electrophoresis of soft particles

NASA Astrophysics Data System (ADS)

Gopmandal, Partha P.; Ohshima, H.

2017-02-01

The present study deals with the electrophoresis of spherical soft particles consisting of an ion and liquid-penetrable but liquid-flow-impenetrable inner core surrounded by an ion and fluid-penetrable polyelectrolyte layer. The inner core is considered to be dielectric and bearing basic functional group coated with polyelectrolyte layer containing acidic functional group. An approximate expression for the electrophoretic mobility of such a particle is obtained under a low potential limit. The electrophoretic behaviour of the undertaken particle is investigated for a wide range of bulk pH values and electrolyte concentrations. Our study also indicates some remarkable features of the electrophoresis e.g., occurrence of zero mobility, mobility reversal etc.
Hard sphere perturbation theory for fluids with soft-repulsive-core potentials

NASA Astrophysics Data System (ADS)

Ben-Amotz, Dor; Stell, George

2004-03-01

The thermodynamic properties of fluids with very soft repulsive-core potentials, resembling those of some liquid metals, are predicted with unprecedented accuracy using a new first-order thermodynamic perturbation theory. This theory is an extension of Mansoori-Canfield/Rasaiah-Stell (MCRS) perturbation theory, obtained by including a configuration integral correction recently identified by Mon, who evaluated it by computer simulation. In this work we derive an analytic expression for Mon's correction in terms of the radial distribution function of the soft-core fluid, g0(r), approximated using Lado's self-consistent extension of Weeks-Chandler-Andersen (WCA) theory. Comparisons with WCA and MCRS predictions show that our new extended-MCRS theory outperforms other first-order theories when applied to fluids with very soft inverse-power potentials (n⩽6), and predicts free energies that are within 0.3kT of simulation results up to the fluid freezing point.
Multicore Considerations for Legacy Flight Software Migration

NASA Technical Reports Server (NTRS)

Vines, Kenneth; Day, Len

2013-01-01

In this paper we will discuss potential benefits and pitfalls when considering a migration from an existing single core code base to a multicore processor implementation. The results of this study present options that should be considered before migrating fault managers, device handlers and tasks with time-constrained requirements to a multicore flight software environment. Possible future multicore test bed demonstrations are also discussed.
Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

NASA Astrophysics Data System (ADS)

Boyko, Oleksiy; Zheleznyak, Mark

2015-04-01

The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
Soft Skills: The New Curriculum for Hard-Core Technical Professionals

ERIC Educational Resources Information Center

Bancino, Randy; Zevalkink, Claire

2007-01-01

In this article, the authors talk about the importance of soft skills for hard-core technical professionals. In many technical professions, the complete focus of education and training is on technical topics either directly or indirectly related to a career or discipline. Students are generally required to master various mathematics skills,…
The Understanding of Curriculum Philosophy among Trainee Teachers in Regards to Soft Skills Embedment

ERIC Educational Resources Information Center

Hassan, Aminuddin; Maharoff, Marina

2014-01-01

Curriculum philosophy may assist in learning practices that coincide with the philosophy of educational institution and community. This study was aimed to understand how the teacher trainees who pursued Bachelor of Teaching (PISMP) understand the embedment of soft skills into learning activities for core courses in Malaysian Institutes of Teacher…
Compact pulse generators with soft ferromagnetic cores driven by gunpowder and explosive.

PubMed

Ben, Chi; He, Yong; Pan, Xuchao; Chen, Hong; He, Yuan

2015-12-01

Compact pulse generators which utilized soft ferromagnets as an initial energy carrier inside multi-turn coil and hard ferromagnets to provide the initial magnetic field outside the coil have been studied. Two methods of reducing the magnetic flux in the generators have been studied: (1) by igniting gunpowder to launch the core out of the generator, and (2) by detonating explosives that demagnetize the core. Several types of compact generators were explored to verify the feasibility. The generators with an 80-turn coil that utilize gunpowder were capable of producing pulses with amplitude 78.6 V and the full width at half maximum was 0.41 ms. The generators with a 37-turn coil that utilize explosive were capable of producing pulses with amplitude 1.41 kV and the full width at half maximum was 11.68 μs. These two methods were both successful, but produce voltage waveforms with significantly different characteristics.
Exchange coupling and microwave absorption in core/shell-structured hard/soft ferrite-based CoFe2O4/NiFe2O4 nanocapsules

NASA Astrophysics Data System (ADS)

Feng, Chao; Liu, Xianguo; Or, Siu Wing; Ho, S. L.

2017-05-01

Core/shell-structured, hard/soft spinel-ferrite-based CoFe2O4/NiFe2O4 (CFO/NFO) nanocapsules with an average diameter of 17 nm are synthesized by a facile two-step hydrothermal process using CFO cores of ˜15 nm diameter as the hard magnetic phase and NFO shells of ˜1 nm thickness as the soft magnetic phase. The single-phase-like hysteresis loop with a high remnant-to-saturation magnetization ratio of 0.7, together with a small grain size of ˜16 nm, confirms the existence of exchange-coupling interaction between the CFO cores and the NFO shells. The effect of hard/soft exchange coupling on the microwave absorption properties is studied. Comparing to CFO and NFO nanoparticles, the finite-size NFO shells and the core/shell structure enable a significant reduction in electric resistivity and an enhancement in dipole and interfacial polarizations in the CFO/NFO nanocapsules, resulting in an obvious increase in dielectric permittivity and loss in the whole S-Ku bands of microwaves of 2-18 GHz, respectively. The exchange-coupling interaction empowers a more favorable response of magnetic moment to microwaves, leading to enhanced exchange resonances in magnetic permeability and loss above 10 GHz. As a result, strong absorption, as characterized by a large reflection loss (RL) of -20.1 dB at 9.7 GHz for an absorber thickness of 4.5 mm as well as a broad effective absorption bandwidth (for RL<-10 dB) of 8.4 GHz (7.8-16.2 GHz) at an absorber thickness range of 3.0-4.5 mm, is obtained.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

PubMed Central

2010-01-01

Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
Magnetic and Electrical Characteristics of Permalloy Thin Tape Bobbin Cores

NASA Technical Reports Server (NTRS)

Schwarze, Gene E.; Wieserman, William R.; Niedra, Janis M.

2005-01-01

The core loss, that is, the power loss, of a soft ferromagnetic material is a function of the flux density, frequency, temperature, excitation type (voltage or current), excitation waveform (sine, square, etc.) and lamination or tape thickness. In previously published papers we have reported on the specific core loss and dynamic B-H loop results for several polycrystalline, nanocrystalline, and amorphous soft magnetic materials. In this previous research we investigated the effect of flux density, frequency, temperature, and excitation waveform for voltage excitation on the specific core loss and dynamic B-H loop. In this paper, we will report on an experimental study to investigate the effect of tape thicknesses of 1, 1/2, 1/4, and 1/8-mil Permalloy type magnetic materials on the specific core loss. The test cores were fabricated by winding the thin tapes on ceramic bobbin cores. The specific core loss tests were conducted at room temperature and over the frequency range of 10 kHz to 750 kHz using sine wave voltage excitation. The results of this experimental investigation will be presented primarily in graphical form to show the effect of tape thickness, frequency, and magnetic flux density on the specific core loss. Also, the experimental results when applied to power transformer design will be briefly discussed.
Histomorphometrical analysis following augmentation of infected extraction sites exhibiting severe bone loss and primarily closed by intrasocket reactive soft tissue.

PubMed

Mardinger, Ofer; Vered, Marilena; Chaushu, Gavriel; Nissan, Joseph

2012-06-01

Intrasocket reactive soft tissue can be used for primary closure during augmentation of infected extraction sites exhibiting severe bone loss prior to implant placement. The present study evaluated the histological characteristics of the initially used intrasocket reactive soft tissue, the overlying soft tissue, and the histomorphometry of the newly formed bone during implant placement. Thirty-six consecutive patients (43 sites) were included in the study. Extraction sites demonstrating extensive bone loss on preoperative periapical and panoramic radiographs served as inclusion criteria. Forty-three implants were inserted after a healing period of 6 months. Porous bovine xenograft bone mineral was used as a single bone substitute. The intrasocket reactive soft tissue was sutured over the grafting material to seal the coronal portion of the socket. Biopsies of the intrasocket reactive soft tissue at augmentation, healed mucosa, and bone cores at implant placement were retrieved and evaluated. The intrasocket reactive soft tissue demonstrated features compatible with granulation tissue and long junctional epithelium. The mucosal samples at implant placement demonstrated histopathological characteristics of keratinized mucosa with no residual elements of granulation tissue. Histomorphometrically, the mean composition of the bone cores was - vital bone 40 ± 19% (13.7-74.8%); bone substitute 25.7 ± 13% (0.6-51%); connective tissue 34.3 ± 15% (13.8-71.9%). Intrasocket reactive soft tissue used for primary closure following ridge augmentation is composed of granulation tissue and long junctional epithelium. At implant placement, clinical and histological results demonstrate its replacement by keratinized gingiva. The histomorphometrical results reveal considerable bone formation. Fresh extraction sites of hopeless teeth demonstrating chronic infection and severe bone loss may be grafted simultaneously with their removal. © 2010 Wiley Periodicals, Inc.
Density Affects the Nature of the Hexatic-Liquid Transition in Two-Dimensional Melting of Soft-Core Systems

NASA Astrophysics Data System (ADS)

Zu, Mengjie; Liu, Jun; Tong, Hua; Xu, Ning

2016-08-01

We find that both continuous and discontinuous hexatic-liquid transitions can happen in the melting of two-dimensional solids of soft-core disks. For three typical model systems, Hertzian, harmonic, and Gaussian-core models, we observe the same scenarios. These systems exhibit reentrant crystallization (melting) with a maximum melting temperature Tm happening at a crossover density ρm. The hexatic-liquid transition at a density smaller than ρm is discontinuous. Liquid and hexatic phases coexist in a density interval, which becomes narrower with increasing temperature and tends to vanish approximately at Tm. Above ρm, the transition is continuous, in agreement with the Kosterlitz-Thouless-Halperin-Nelson-Young theory. For these soft-core systems, the nature of the hexatic-liquid transition depends on density (pressure), with the melting at ρm being a plausible transition point from discontinuous to continuous hexatic-liquid transition.
Tuning the bridging attraction between large hard particles by the softness of small microgels.

PubMed

Luo, Junhua; Yuan, Guangcui; Han, Charles C

2016-09-20

In this study, the attraction between large hard polystyrene (PS) spheres is studied by using three types of small microgels as bridging agents. One is a purely soft poly(N-isopropylacrylamide) (PNIPAM) microgel, the other two have a non-deformable PS hard core surrounded by a soft PNIPAM shell but are different in the core-shell ratio. The affinity for bridging the large PS spheres is provided and thus affected by the PNIPAM constituent in the microgels. The bridging effects caused by the microgels can be indirectly incorporated into their influence on the effective attraction interaction between the large hard spheres, since the size of the microgels is very small in comparison to the size of the PS hard spheres. At a given volume fraction of large PS spheres, they behave essentially as hard spheres in the absence of small microgels. By gradually adding the microgels, the large spheres are connected to each other through the bridging of small particles until the attraction strength reaches a maximum value, after which adding more small particles slowly decreases the effective attraction strength and eventually the large particles disperse individually when saturated adsorption is achieved. The aggregation and gelation behaviors triggered by these three types of small microgels are compared and discussed. A way to tune the strength and range of the short-range attractive potential via changing the softness of bridging microgels (which can be achieved either by using core-shell microgels or by changing the temperature) is proposed.
Hiding the Disk and Network Latency of Out-of-Core Visualization

NASA Technical Reports Server (NTRS)

Ellsworth, David

2001-01-01

This paper describes an algorithm that improves the performance of application-controlled demand paging for out-of-core visualization by hiding the latency of reading data from both local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The paper includes measurements that show that the new multithreaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by two thirds. Visualization runs using data from remote disk actually ran faster than ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.

Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

NASA Astrophysics Data System (ADS)

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

2015-05-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
A pluggable framework for parallel pairwise sequence search.

PubMed

Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli

2007-01-01

The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Sourouri, Mohammed; Birger Raknes, Espen

2017-04-01

In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most accelerators, KNL sports a memory subsystem consisting of low-level caches and 16GB of high-bandwidth MCDRAM memory. For capacity computing, up to 400GB of conventional DDR4 memory is provided. Such a strict hierarchical memory layout means that data locality is imperative if the true potential of this product is to be harnessed. In this work, we study a series of optimizations specifically targeting KNL for our EWE based application to reduce the time-to-solution time for the following 3D model sizes in grid points: 1283, 2563 and 5123. We compare the results with an optimized version for multi-core CPUs running on a dual-socket Xeon E5 2680v3 system using OpenMP. Our initial naive implementation on the KNL is roughly 20% faster than the multi-core version, but by using only one thread per core and careful memory placement using the memkind library, we could achieve higher speedups. Additionally, by using the MCDRAM as cache for problem sizes that are smaller than 16 GB further performance improvements were unlocked. Depending on the problem size, our overall results indicate that the KNL based system is approximately 2.2x faster than the 24-core Xeon E5 2680v3 system, with only modest changes to the code.
Soft particles at fluid interfaces: wetting, structure, and rheology

NASA Astrophysics Data System (ADS)

Isa, Lucio

Most of our current knowledge concerning the behavior of colloidal particles at fluid interfaces is limited to model spherical, hard and uniform objects. Introducing additional complexity, in terms of shape, composition or surface chemistry or by introducing particle softness, opens up a vast range of possibilities to address new fundamental and applied questions in soft matter systems at fluid interfaces. In this talk I will focus on the role of particle softness, taking the case of core-shell microgels as a paradigmatic example. Microgels are highly swollen and cross-linked hydrogel particles that, in parallel with their practical applications, e.g. for emulsion stabilization and surface patterning, are increasingly used as model systems to capture fundamental properties of bulk materials. Most microgel particles develop a core-shell morphology during synthesis, with a more cross-linked core surrounded by a corona of loosely linked and dangling polymer chains. I will first discuss the difference between the wetting of a hard spherical colloid and a core-shell microgel at an oil-water interface, pinpointing the interplay between adsorption at the interface and particle deformation. I will then move on to discuss the interplay between particle morphology and the microstructure and rheological properties of the interface. In particular, I will demonstrate that synchronizing the compression of a core-shell microgel-laden fluid interface with the deposition of the interfacial monolayer makes it possible to transfer the 2D phase diagram of the particles onto a solid substrate, where different positions correspond to different values of the surface pressure and the specific area. Using atomic force microscopy, we analyzed the microstructure of the monolayer and discovered a phase transition between two crystalline phases with the same hexagonal symmetry, but with two different lattice constants. The two phases correspond to shell-shell or core-core inter-particle contacts, respectively, where with increasing surface pressure the former mechanically fail enabling the particle cores to come into contact. In the phase-transition region, clusters of particles in core-core contacts nucleate, melting the surrounding shell-shell crystal, until the whole monolayer moves into the second phase. We furthermore extended our analysis to measure the interfacial rheology of the monolayers as a function of the surface pressure using an interfacial microdisk rheometer; the interfaces always show a strong elastic response, with a dip in the elastic modulus in correspondence of the melting of the shell-shell phase, followed by a steep increase upon formation of a percolating network of the core-core contacts. The presented results highlight the complex interplay between the wetting and deformation of individual soft particles at fluid interfaces and the overall interface microstructure and mechanics. They show strong connections to fundamental studies on phase transitions in two-dimensional systems and pave the way for novel nanoscale surface patterning routes. The author acknowledges financial support from the Swiss National Science Foundation Grant PP00P2-144646/1.
Fault-Tolerant, Radiation-Hard DSP

NASA Technical Reports Server (NTRS)

Czajkowski, David

2011-01-01

Commercial digital signal processors (DSPs) for use in high-speed satellite computers are challenged by the damaging effects of space radiation, mainly single event upsets (SEUs) and single event functional interrupts (SEFIs). Innovations have been developed for mitigating the effects of SEUs and SEFIs, enabling the use of very-highspeed commercial DSPs with improved SEU tolerances. Time-triple modular redundancy (TTMR) is a method of applying traditional triple modular redundancy on a single processor, exploiting the VLIW (very long instruction word) class of parallel processors. TTMR improves SEU rates substantially. SEFIs are solved by a SEFI-hardened core circuit, external to the microprocessor. It monitors the health of the processor, and if a SEFI occurs, forces the processor to return to performance through a series of escalating events. TTMR and hardened-core solutions were developed for both DSPs and reconfigurable field-programmable gate arrays (FPGAs). This includes advancement of TTMR algorithms for DSPs and reconfigurable FPGAs, plus a rad-hard, hardened-core integrated circuit that services both the DSP and FPGA. Additionally, a combined DSP and FPGA board architecture was fully developed into a rad-hard engineering product. This technology enables use of commercial off-the-shelf (COTS) DSPs in computers for satellite and other space applications, allowing rapid deployment at a much lower cost. Traditional rad-hard space computers are very expensive and typically have long lead times. These computers are either based on traditional rad-hard processors, which have extremely low computational performance, or triple modular redundant (TMR) FPGA arrays, which suffer from power and complexity issues. Even more frustrating is that the TMR arrays of FPGAs require a fixed, external rad-hard voting element, thereby causing them to lose much of their reconfiguration capability and in some cases significant speed reduction. The benefits of COTS high-performance signal processing include significant increase in onboard science data processing, enabling orders of magnitude reduction in required communication bandwidth for science data return, orders of magnitude improvement in onboard mission planning and critical decision making, and the ability to rapidly respond to changing mission environments, thus enabling opportunistic science and orders of magnitude reduction in the cost of mission operations through reduction of required staff. Additional benefits of COTS-based, high-performance signal processing include the ability to leverage considerable commercial and academic investments in advanced computing tools, techniques, and infra structure, and the familiarity of the science and IT community with these computing environments.
SCORPIO: A Scalable Two-Phase Parallel I/O Library With Application To A Large Scale Subsurface Simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T

2013-01-01

Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
Microcomputer control soft tube measuring-testing instrument

NASA Astrophysics Data System (ADS)

Zhou, Yanzhou; Jiang, Xiu-Zhen; Wang, Wen-Yi

1993-09-01

Soft tube are key and easily spoiled parts used by the vehicles in the transportation with large numbers. Measuring and testing of the tubes were made by hands for a long time. Cooperating with Harbin Railway Bureau recently we have developed a new kind of automatical measuring and testing instrument In the paper the instrument structure property and measuring principle are presented in details. Centre of the system is a singlechip processor INTEL 80C31 . It can collect deal with data and display the results on LED. Furthermore it brings electromagnetic valves and motors under control. Five soft tubes are measured and tested in the same time all the process is finished automatically. On the hardware and software counter-electromagnetic disturbance methods is adopted efficiently so the performance of the instrument is improved significantly. In the long run the instrument is reliable and practical It solves a quite difficult problem in the railway transportation.
Ultra-Reliable Digital Avionics (URDA) processor

NASA Astrophysics Data System (ADS)

Branstetter, Reagan; Ruszczyk, William; Miville, Frank

1994-10-01

Texas Instruments Incorporated (TI) developed the URDA processor design under contract with the U.S. Air Force Wright Laboratory and the U.S. Army Night Vision and Electro-Sensors Directorate. TI's approach couples advanced packaging solutions with advanced integrated circuit (IC) technology to provide a high-performance (200 MIPS/800 MFLOPS) modular avionics processor module for a wide range of avionics applications. TI's processor design integrates two Ada-programmable, URDA basic processor modules (BPM's) with a JIAWG-compatible PiBus and TMBus on a single F-22 common integrated processor-compatible form-factor SEM-E avionics card. A separate, high-speed (25-MWord/second 32-bit word) input/output bus is provided for sensor data. Each BPM provides a peak throughput of 100 MIPS scalar concurrent with 400-MFLOPS vector processing in a removable multichip module (MCM) mounted to a liquid-flowthrough (LFT) core and interfacing to a processor interface module printed wiring board (PWB). Commercial RISC technology coupled with TI's advanced bipolar complementary metal oxide semiconductor (BiCMOS) application specific integrated circuit (ASIC) and silicon-on-silicon packaging technologies are used to achieve the high performance in a miniaturized package. A Mips R4000-family reduced instruction set computer (RISC) processor and a TI 100-MHz BiCMOS vector coprocessor (VCP) ASIC provide, respectively, the 100 MIPS of a scalar processor throughput and 400 MFLOPS of vector processing throughput for each BPM. The TI Aladdim ASIC chipset was developed on the TI Aladdin Program under contract with the U.S. Army Communications and Electronics Command and was sponsored by the Advanced Research Projects Agency with technical direction from the U.S. Army Night Vision and Electro-Sensors Directorate.
Multi-Core Processors: An Enabling Technology for Embedded Distributed Model-Based Control (Postprint)

DTIC Science & Technology

2008-07-01

generation of process partitioning, a thread pipelining becomes possible. In this paper we briefly summarize the requirements and trends for FADEC based... FADEC environment, presenting a hypothetical realization of an example application. Finally we discuss the application of Time-Triggered...based control applications of the future. 15. SUBJECT TERMS Gas turbine, FADEC , Multi-core processing technology, disturbed based control
The effect of instantaneous input dynamic range setting on the speech perception of children with the nucleus 24 implant.

PubMed

Davidson, Lisa S; Skinner, Margaret W; Holstad, Beth A; Fears, Beverly T; Richter, Marie K; Matusofsky, Margaret; Brenner, Christine; Holden, Timothy; Birath, Amy; Kettel, Jerrica L; Scollie, Susan

2009-06-01

The purpose of this study was to examine the effects of a wider instantaneous input dynamic range (IIDR) setting on speech perception and comfort in quiet and noise for children wearing the Nucleus 24 implant system and the Freedom speech processor. In addition, children's ability to understand soft and conversational level speech in relation to aided sound-field thresholds was examined. Thirty children (age, 7 to 17 years) with the Nucleus 24 cochlear implant system and the Freedom speech processor with two different IIDR settings (30 versus 40 dB) were tested on the Consonant Nucleus Consonant (CNC) word test at 50 and 60 dB SPL, the Bamford-Kowal-Bench Speech in Noise Test, and a loudness rating task for four-talker speech noise. Aided thresholds for frequency-modulated tones, narrowband noise, and recorded Ling sounds were obtained with the two IIDRs and examined in relation to CNC scores at 50 dB SPL. Speech Intelligibility Indices were calculated using the long-term average speech spectrum of the CNC words at 50 dB SPL measured at each test site and aided thresholds. Group mean CNC scores at 50 dB SPL with the 40 IIDR were significantly higher (p < 0.001) than with the 30 IIDR. Group mean CNC scores at 60 dB SPL, loudness ratings, and the signal to noise ratios-50 for Bamford-Kowal-Bench Speech in Noise Test were not significantly different for the two IIDRs. Significantly improved aided thresholds at 250 to 6000 Hz as well as higher Speech Intelligibility Indices afforded improved audibility for speech presented at soft levels (50 dB SPL). These results indicate that an increased IIDR provides improved word recognition for soft levels of speech without compromising comfort of higher levels of speech sounds or sentence recognition in noise.
Embedded Palmprint Recognition System Using OMAP 3530

PubMed Central

Shen, Linlin; Wu, Shipei; Zheng, Songhao; Ji, Zhen

2012-01-01

We have proposed in this paper an embedded palmprint recognition system using the dual-core OMAP 3530 platform. An improved algorithm based on palm code was proposed first. In this method, a Gabor wavelet is first convolved with the palmprint image to produce a response image, where local binary patterns are then applied to code the relation among the magnitude of wavelet response at the ccentral pixel with that of its neighbors. The method is fully tested using the public PolyU palmprint database. While palm code achieves only about 89% accuracy, over 96% accuracy is achieved by the proposed G-LBP approach. The proposed algorithm was then deployed to the DSP processor of OMAP 3530 and work together with the ARM processor for feature extraction. When complicated algorithms run on the DSP processor, the ARM processor can focus on image capture, user interface and peripheral control. Integrated with an image sensing module and central processing board, the designed device can achieve accurate and real time performance. PMID:22438721
Embedded palmprint recognition system using OMAP 3530.

PubMed

Shen, Linlin; Wu, Shipei; Zheng, Songhao; Ji, Zhen

2012-01-01

We have proposed in this paper an embedded palmprint recognition system using the dual-core OMAP 3530 platform. An improved algorithm based on palm code was proposed first. In this method, a Gabor wavelet is first convolved with the palmprint image to produce a response image, where local binary patterns are then applied to code the relation among the magnitude of wavelet response at the central pixel with that of its neighbors. The method is fully tested using the public PolyU palmprint database. While palm code achieves only about 89% accuracy, over 96% accuracy is achieved by the proposed G-LBP approach. The proposed algorithm was then deployed to the DSP processor of OMAP 3530 and work together with the ARM processor for feature extraction. When complicated algorithms run on the DSP processor, the ARM processor can focus on image capture, user interface and peripheral control. Integrated with an image sensing module and central processing board, the designed device can achieve accurate and real time performance.
High-lying intermediate excitations in the nuclear effective interaction with a super-soft-core potential

DOE Office of Scientific and Technical Information (OSTI.GOV)

Goode, P.R.; Barrett, B.R.; Portilho, O.

1979-02-01

The earlier calculations of Goode and Barrett are repeated using the super-soft-core potential of Gogny, Pires, and de Tourreil. The particular third-order folded diagram which they calculated now converges in its intermediate-state energy summation, because of the suppression of the strong short-range repulsive effects present in earlier calculations.
CLEAR: Cross-Layer Exploration for Architecting Resilience

DTIC Science & Technology

2017-03-01

benchmark analysis, also provides cost-effective solutions (~1% additional energy cost for the same 50× improvement). This paper addresses the...core (OoO-core) [Wang 04], across 18 benchmarks . Such extensive exploration enables us to conclusively answer the above cross-layer resilience...analysis of the effects of soft errors on application benchmarks , provides a highly effective soft error resilience approach. 3. The above
A Wearable Healthcare System With a 13.7 μA Noise Tolerant ECG Processor.

PubMed

Izumi, Shintaro; Yamashita, Ken; Nakano, Masanao; Kawaguchi, Hiroshi; Kimura, Hiromitsu; Marumoto, Kyoji; Fuchikami, Takaaki; Fujimori, Yoshikazu; Nakajima, Hiroshi; Shiga, Toshikazu; Yoshimoto, Masahiko

2015-10-01

To prevent lifestyle diseases, wearable bio-signal monitoring systems for daily life monitoring have attracted attention. Wearable systems have strict size and weight constraints, which impose significant limitations of the battery capacity and the signal-to-noise ratio of bio-signals. This report describes an electrocardiograph (ECG) processor for use with a wearable healthcare system. It comprises an analog front end, a 12-bit ADC, a robust Instantaneous Heart Rate (IHR) monitor, a 32-bit Cortex-M0 core, and 64 Kbyte Ferroelectric Random Access Memory (FeRAM). The IHR monitor uses a short-term autocorrelation (STAC) algorithm to improve the heart-rate detection accuracy despite its use in noisy conditions. The ECG processor chip consumes 13.7 μA for heart rate logging application.
Cognitive and neural foundations of discrete sequence skill: a TMS study.

PubMed

Ruitenberg, Marit F L; Verwey, Willem B; Schutter, Dennis J L G; Abrahamse, Elger L

2014-04-01

Executing discrete movement sequences typically involves a shift with practice from a relatively slow, stimulus-based mode to a fast mode in which performance is based on retrieving and executing entire motor chunks. The dual processor model explains the performance of (skilled) discrete key-press sequences in terms of an interplay between a cognitive processor and a motor system. In the present study, we tested and confirmed the core assumptions of this model at the behavioral level. In addition, we explored the involvement of the pre-supplementary motor area (pre-SMA) in discrete sequence skill by applying inhibitory 20 min 1-Hz off-line repetitive transcranial magnetic stimulation (rTMS). Based on previous work, we predicted pre-SMA involvement in the selection/initiation of motor chunks, and this was confirmed by our results. The pre-SMA was further observed to be more involved in more complex than in simpler sequences, while no evidence was found for pre-SMA involvement in direct stimulus-response translations or associative learning processes. In conclusion, support is provided for the dual processor model, and for pre-SMA involvement in the initiation of motor chunks. Copyright © 2014 Elsevier Ltd. All rights reserved.
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

NASA Astrophysics Data System (ADS)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Development of an embedded atmospheric turbulence mitigation engine

NASA Astrophysics Data System (ADS)

Paolini, Aaron; Bonnett, James; Kozacik, Stephen; Kelmelis, Eric

2017-05-01

Methods to reconstruct pictures from imagery degraded by atmospheric turbulence have been under development for decades. The techniques were initially developed for observing astronomical phenomena from the Earth's surface, but have more recently been modified for ground and air surveillance scenarios. Such applications can impose significant constraints on deployment options because they both increase the computational complexity of the algorithms themselves and often dictate a requirement for low size, weight, and power (SWaP) form factors. Consequently, embedded implementations must be developed that can perform the necessary computations on low-SWaP platforms. Fortunately, there is an emerging class of embedded processors driven by the mobile and ubiquitous computing industries. We have leveraged these processors to develop embedded versions of the core atmospheric correction engine found in our ATCOM software. In this paper, we will present our experience adapting our algorithms for embedded systems on a chip (SoCs), namely the NVIDIA Tegra that couples general-purpose ARM cores with their graphics processing unit (GPU) technology and the Xilinx Zynq which pairs similar ARM cores with their field-programmable gate array (FPGA) fabric.
Study of High-Efficiency Motors Using Soft Magnetic Cores

NASA Astrophysics Data System (ADS)

Tokoi, Hirooki; Kawamata, Shoichi; Enomoto, Yuji

We have been developed a small and highly efficient axial gap motor whose stator core is made of a soft magnetic core. First, the loss sensitivities to various motor design parameters were evaluated using magnetic field analysis. It was found that the pole number and core dimensions had low sensitivity (≤ 2.2dB) in terms of the total loss, which is the sum of the copper loss and the iron losses in the stator core and the rotor yoke respectively. From this, we concluded that to improve the motor efficiency, it is essential to reduce the iron loss in the rotor yoke and minimize other losses. With this in mind, a prototype axial gap motor is manufactured and tested. The motor has four poles and six slots. The motor is 123mm in diameter and the axial length is 47mm. The rotor has parallel magnetized magnets and a rotor yoke with magnetic steel sheets. The maximum measured motor efficiency is 93%. This value roughly agrees with the maximum calculated efficiency of 95%.
Soft magnetic characteristics of laminated magnetic block cores assembled with a high Bs nanocrystalline alloy

NASA Astrophysics Data System (ADS)

Yao, Atsushi; Inoue, Masaki; Tsukada, Kouhei; Fujisaki, Keisuke

2018-05-01

This paper focuses on an evaluation of core losses in laminated magnetic block cores assembled with a high Bs nanocrystalline alloy in high magnetic flux density region. To discuss the soft magnetic properties of the high Bs block cores, the comparison with amorphous (SA1) block cores is also performed. In the high Bs block core, both low core losses and high saturation flux densities Bs are satisfied in the low frequency region. Furthermore, in the laminated block core made of the high Bs alloy, the rate of increase of iron losses as a function of the magnetic flux density remains small up to around 1.6 T, which cannot be realized in conventional laminated block cores based on amorphous alloy. The block core made of the high Bs alloy exhibits comparable core loss with that of amorphous alloy core in the high-frequency region. Thus, it is expected that this laminated high Bs block core can achieve low core losses and high saturation flux densities in the high-frequency region.

Web-based DAQ systems: connecting the user and electronics front-ends

NASA Astrophysics Data System (ADS)

Lenzi, Thomas

2016-12-01

Web technologies are quickly evolving and are gaining in computational power and flexibility, allowing for a paradigm shift in the field of Data Acquisition (DAQ) systems design. Modern web browsers offer the possibility to create intricate user interfaces and are able to process and render complex data. Furthermore, new web standards such as WebSockets allow for fast real-time communication between the server and the user with minimal overhead. Those improvements make it possible to move the control and monitoring operations from the back-end servers directly to the user and to the front-end electronics, thus reducing the complexity of the data acquisition chain. Moreover, web-based DAQ systems offer greater flexibility, accessibility, and maintainability on the user side than traditional applications which often lack portability and ease of use. As proof of concept, we implemented a simplified DAQ system on a mid-range Spartan6 Field Programmable Gate Array (FPGA) development board coupled to a digital front-end readout chip. The system is connected to the Internet and can be accessed from any web browser. It is composed of custom code to control the front-end readout and of a dual soft-core Microblaze processor to communicate with the client.
Implementation Of The Configurable Fault Tolerant System Experiment On NPSAT 1

DTIC Science & Technology

2016-03-01

REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE IMPLEMENTATION OF THE CONFIGURABLE FAULT TOLERANT SYSTEM EXPERIMENT ON NPSAT...open-source microprocessor without interlocked pipeline stages (MIPS) based processor softcore, a cached memory structure capable of accessing double...data rate type three and secure digital card memories, an interface to the main satellite bus, and XILINX’s soft error mitigation softcore. The
Guidance of Autonomous Aerospace Vehicles for Vertical Soft Landing using Nonlinear Control Theory

DTIC Science & Technology

2015-08-11

Measured and Kalman filter Estimate of the Roll Attitude of the Quad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4...and faster Hart- ley et al. [2013]. With availability of small, light, high fidelity sensors (Inertial Measurement Units IMU ) and processors on board...is a product of inverse of rotation matrix and inertia matrix for the quad frame. Since both the matrix are invertible at all times except when roll
Heterogeneous high throughput scientific computing with APM X-Gene and Intel Xeon Phi

DOE PAGES

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; ...

2015-05-22

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. As a result, we report our experience on software porting, performance and energy efficiency and evaluatemore » the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).« less
Low Power Consumption Design and Fabrication of Thin Film Core for Micro Fluxgate.

PubMed

Lv, Hui; Liu, Shibin

2016-03-01

The soft magnetic characteristic of core is a critical factor to performance of the micro fluxgate. Porous thin film core can be effectively used to decrease the value of saturation magnetic field strength (H(s)) and improve soft magnetic behavior. It is conducive to impelling the micro fluxgate toward the direction of low power consumption. In this work, negative photoresist is used to fabricate a porous core by MEMS technology. Through the processes of ultraviolet-lithography, the porous pattern transfer from the mask to the microstructure on silicon substrate. The experiment result complies with the anticipation and indicates that this MEMS technique can be applied to improve the characteristic of thin film core and decrease power consumption of fluxgate sensor.
Active Flash: Performance-Energy Tradeoffs for Out-of-Core Processing on Non-Volatile Memory Devices

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boboila, Simona; Kim, Youngjae; Vazhkudai, Sudharshan S

2012-01-01

In this abstract, we study the performance and energy tradeoffs involved in migrating data analysis into the flash device, a process we refer to as Active Flash. The Active Flash paradigm is similar to 'active disks', which has received considerable attention. Active Flash allows us to move processing closer to data, thereby minimizing data movement costs and reducing power consumption. It enables true out-of-core computation. The conventional definition of out-of-core solvers refers to an approach to process data that is too large to fit in the main memory and, consequently, requires access to disk. However, in Active Flash, processing outsidemore » the host CPU literally frees the core and achieves real 'out-of-core' analysis. Moving analysis to data has long been desirable, not just at this level, but at all levels of the system hierarchy. However, this requires a detailed study on the tradeoffs involved in achieving analysis turnaround under an acceptable energy envelope. To this end, we first need to evaluate if there is enough computing power on the flash device to warrant such an exploration. Flash processors require decent computing power to run the internal logic pertaining to the Flash Translation Layer (FTL), which is responsible for operations such as address translation, garbage collection (GC) and wear-leveling. Modern SSDs are composed of multiple packages and several flash chips within a package. The packages are connected using multiple I/O channels to offer high I/O bandwidth. SSD computing power is also expected to be high enough to exploit such inherent internal parallelism within the drive to increase the bandwidth and to handle fast I/O requests. More recently, SSD devices are being equipped with powerful processing units and are even embedded with multicore CPUs (e.g. ARM Cortex-A9 embedded processor is advertised to reach 2GHz frequency and deliver 5000 DMIPS; OCZ RevoDrive X2 SSD has 4 SandForce controllers, each with 780MHz max frequency Tensilica core). Efforts that take advantage of the available computing cycles on the processors on SSDs to run auxiliary tasks other than actual I/O requests are beginning to emerge. Kim et al. investigate database scan operations in the context of processing on the SSDs, and propose dedicated hardware logic to speed up scans. Also, cluster architectures have been explored, which consist of low-power embedded CPUs coupled with small local flash to achieve fast, parallel access to data. Processor utilization on SSD is highly dependent on workloads and, therefore, they can be idle during periods with no I/O accesses. We propose to use the available processing capability on the SSD to run tasks that can be offloaded from the host. This paper makes the following contributions: (1) We have investigated Active Flash and its potential to optimize the total energy cost, including power consumption on the host and the flash device; (2) We have developed analytical models to analyze the performance-energy tradeoffs for Active Flash, by treating the SSD as a blackbox, this is particularly valuable due to the proprietary nature of the SSD internal hardware; and (3) We have enhanced a well-known SSD simulator (from MSR) to implement 'on-the-fly' data compression using Active Flash. Our results provide a window into striking a balance between energy consumption and application performance.« less
Synthesis, microstructure and magnetic properties of Fe{sub 3}Si{sub 0.7}Al{sub 0.3}@SiO{sub 2} core–shell particles and Fe{sub 3}Si/Al{sub 2}O{sub 3} soft magnetic composite core

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Jian, E-mail: snove418562@163.com; Key Laboratory for Ferrous Metallurgy and Resources Utilization of Ministry of Education, Wuhan University of Science and Technology, Wuhan, Hubei 430081; Fan, Xi’an, E-mail: groupfxa@163.com

2015-11-15

Fe{sub 3}Si{sub 0.7}Al{sub 0.3}@SiO{sub 2} core–shell particles and Fe{sub 3}Si/Al{sub 2}O{sub 3} soft magnetic composite core have been synthesized via a modified stöber method combined with following high temperature sintering process. Most of conductive Fe{sub 3}Si{sub 0.7}Al{sub 0.3} particles could be uniformly coated by insulating SiO{sub 2} using the modified stöber method. The Fe{sub 3}Si{sub 0.7}Al{sub 0.3}@SiO{sub 2} core–shell particles exhibited good soft magnetic properties with low coercivity and high saturation magnetization. The reaction 4Al+3SiO{sub 2}=2α-Al{sub 2}O{sub 3}+3Si took place during the sintering process. As a result the new Fe{sub 3}Si/Al{sub 2}O{sub 3} composite was formed. The Fe{sub 3}Si/Al{sub 2}O{submore » 3} composite core displayed more excellent soft magnetic properties, better frequency stability at high frequencies, much higher electrical resistivity and lower core loss than the pure Fe{sub 3}Si{sub 0.7}Al{sub 0.3} core. The method of introducing insulating layers surrounding magnetic particles provides a promising route to develop new and high compact soft magnetic materials with good magnetic and electric properties. - Graphical abstract: In Fe{sub 3}Si/Al{sub 2}O{sub 3} composite, Fe{sub 3}Si phases are separated by Al{sub 2}O{sub 3} layers and the eddy currents are confined in Fe{sub 3}Si phases, thus increasing resistivity and reducing core loss. - Highlights: • Fe{sub 3}Si{sub 0.7}Al{sub 0.3}@SiO{sub 2} core–shell particles and Fe{sub 3}Si/Al{sub 2}O{sub 3} cores were prepared. • Fe{sub 3}Si{sub 0.7}Al{sub 0.3} particles could be uniformly coated by nano-sized SiO{sub 2} clusters. • Fe{sub 3}Si{sub 0.7}Al{sub 0.3}@SiO{sub 2} particles and Fe{sub 3}Si/Al{sub 2}O{sub 3} cores showed good soft magnetic properties. • Fe{sub 3}Si/Al{sub 2}O{sub 3} had lower core loss and better frequency stability than Fe{sub 3}Si{sub 0.7}Al{sub 0.3} cores.« less
Succession of Hydrocarbon Degradation and Microbial Diversity during a Simulated Petroleum Seepage in Caspian Sea Sediments

NASA Astrophysics Data System (ADS)

Mishra, S.; Stagars, M.; Wefers, P.; Schmidt, M.; Knittel, K.; Krueger, M.; Leifer, I.; Treude, T.

2016-02-01

Microbial degradation of petroleum was investigated in intact sediment cores of Caspian Sea during a simulated petroleum seepage using a sediment-oil-flow-through (SOFT) system. Over the course of the SOFT experiment (190 days), distinct redox zones established and evolved in the sediment core. Methanogenesis and sulfate reduction were identified to be important processes in the anaerobic degradation of hydrocarbons. C1 to C6 n-alkanes were completely exhausted in the sulfate-reducing zone and some higher alkanes decreased during the upward migration of petroleum. A diversity of sulfate-reducing bacteria was identified by 16s rRNA phylogenetic studies, some of which are associated with marine seeps and petroleum degradation. The δ13C signal of produced methane decreased from -33.7‰ to -49.5‰ indicating crude oil degradation by methanogenesis, which was supported by enrichment culturing of methanogens with petroleum hydrocarbons and presence of methanogenic archaea. The SOFT system is, to the best of our knowledge, the first system that simulates an oil-seep like condition and enables live monitoring of biogeochemical changes within a sediment core during petroleum seepage. During our presentation we will compare the Caspian Sea data with other sediments we studied using the SOFT system from sites such as Santa Barbara (Pacific Ocean), the North Alex Mud Volcano (Mediterranean Sea) and the Eckernfoerde Bay (Baltic Sea). This research was funded by the Deutsche Forschungsgemeinschaft (SPP 1319) and DEA Deutsche Erdoel AG. Further support came from the Helmholtz and Max Planck Gesellschaft.
Security Primitives for Reconfigurable Hardware-Based Systems

DTIC Science & Technology

2010-05-01

work, we propose security primitives using ideas centered around the notion of “moats and drawbridges .” The primitives encompass four design properties...Santa Bar- bara, CA 93106; email: sherwood@cs.ucsb.edu; R. Kastner, Department of Computer Science and Engineering , University of California, San...fingerprint reader), the other to control the ethernet IP core—and an AES encryption engine used by both of the processor cores. These cores are all implemented
Progress Towards a Rad-Hydro Code for Modern Computing Architectures LA-UR-10-02825

NASA Astrophysics Data System (ADS)

Wohlbier, J. G.; Lowrie, R. B.; Bergen, B.; Calef, M.

2010-11-01

We are entering an era of high performance computing where data movement is the overwhelming bottleneck to scalable performance, as opposed to the speed of floating-point operations per processor. All multi-core hardware paradigms, whether heterogeneous or homogeneous, be it the Cell processor, GPGPU, or multi-core x86, share this common trait. In multi-physics applications such as inertial confinement fusion or astrophysics, one may be solving multi-material hydrodynamics with tabular equation of state data lookups, radiation transport, nuclear reactions, and charged particle transport in a single time cycle. The algorithms are intensely data dependent, e.g., EOS, opacity, nuclear data, and multi-core hardware memory restrictions are forcing code developers to rethink code and algorithm design. For the past two years LANL has been funding a small effort referred to as Multi-Physics on Multi-Core to explore ideas for code design as pertaining to inertial confinement fusion and astrophysics applications. The near term goals of this project are to have a multi-material radiation hydrodynamics capability, with tabular equation of state lookups, on cartesian and curvilinear block structured meshes. In the longer term we plan to add fully implicit multi-group radiation diffusion and material heat conduction, and block structured AMR. We will report on our progress to date.
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors.

PubMed

Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun

2016-07-08

Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction.
Towards a cyber-physical era: soft computing framework based multi-sensor array for water quality monitoring

NASA Astrophysics Data System (ADS)

Bhardwaj, Jyotirmoy; Gupta, Karunesh K.; Gupta, Rajiv

2018-02-01

New concepts and techniques are replacing traditional methods of water quality parameter measurement systems. This paper introduces a cyber-physical system (CPS) approach for water quality assessment in a distribution network. Cyber-physical systems with embedded sensors, processors and actuators can be designed to sense and interact with the water environment. The proposed CPS is comprised of sensing framework integrated with five different water quality parameter sensor nodes and soft computing framework for computational modelling. Soft computing framework utilizes the applications of Python for user interface and fuzzy sciences for decision making. Introduction of multiple sensors in a water distribution network generates a huge number of data matrices, which are sometimes highly complex, difficult to understand and convoluted for effective decision making. Therefore, the proposed system framework also intends to simplify the complexity of obtained sensor data matrices and to support decision making for water engineers through a soft computing framework. The target of this proposed research is to provide a simple and efficient method to identify and detect presence of contamination in a water distribution network using applications of CPS.
The Holidays Are Coming! Time to Start Planning for Healthy Holiday Meals

MedlinePlus

... 1 medium orange, quartered and seeds removed 1 apple, cored ¾ cup to 1 cup sugar (or substitute non-sugar sweetener) Put berries, orange and apple through food processor, blender or food mill until ...
Using Multi-Core Systems for Rover Autonomy

NASA Technical Reports Server (NTRS)

Clement, Brad; Estlin, Tara; Bornstein, Benjamin; Springer, Paul; Anderson, Robert C.

2010-01-01

Task Objectives are: (1) Develop and demonstrate key capabilities for rover long-range science operations using multi-core computing, (a) Adapt three rover technologies to execute on SOA multi-core processor (b) Illustrate performance improvements achieved (c) Demonstrate adapted capabilities with rover hardware, (2) Targeting three high-level autonomy technologies (a) Two for onboard data analysis (b) One for onboard command sequencing/planning, (3) Technologies identified as enabling for future missions, (4)Benefits will be measured along several metrics: (a) Execution time / Power requirements (b) Number of data products processed per unit time (c) Solution quality
Design of composite flywheel rotors with soft cores

NASA Astrophysics Data System (ADS)

Kim, Taehan

A flywheel is an inertial energy storage system in which the energy or momentum is stored in a rotating mass. Over the last twenty years, high-performance flywheels have been developed with significant improvements, showing potential as energy storage systems in a wide range of applications. Despite the great advances in fundamental knowledge and technology, the current successful rotors depend mainly on the recent developments of high-stiffness and high-strength carbon composites. These composites are expensive and the cost of flywheels made of them is high. The ultimate goal of the study presented here is the development of a cost-effective composite rotor made of a hybrid material. In this study, two-dimensional and three-dimensional analysis tools were developed and utilized in the design of the composite rim, and extensive spin tests were performed to validate the designed rotors and give a sound basis for large-scale rotor design. Hybrid rims made of several different composite materials can effectively reduce the radial stress in the composite rim, which is critical in the design of composite rims. Since the hybrid composite rims we studied employ low-cost glass fiber for the inside of the rim, and the result is large radial growth of the hybrid rim, conventional metallic hubs cannot be used in this design. A soft core developed in this study was successfully able to accommodate the large radial growth of the rim. High bonding strength at the shaft-to-core interface was achieved by the soft core being molded directly onto the steel shaft, and a tapered geometry was used to avoid stress concentrations at the shaft-to-core interface. Extensive spin tests were utilized for reverse engineering of the design of composite rotors, and there was good correlation between tests and analysis. A large-scale composite rotor for ground transportation is presented with the performance levels predicted for it.
Embedded System Implementation on FPGA System With μCLinux OS

NASA Astrophysics Data System (ADS)

Fairuz Muhd Amin, Ahmad; Aris, Ishak; Syamsul Azmir Raja Abdullah, Raja; Kalos Zakiah Sahbudin, Ratna

2011-02-01

Embedded systems are taking on more complicated tasks as the processors involved become more powerful. The embedded systems have been widely used in many areas such as in industries, automotives, medical imaging, communications, speech recognition and computer vision. The complexity requirements in hardware and software nowadays need a flexibility system for further enhancement in any design without adding new hardware. Therefore, any changes in the design system will affect the processor that need to be changed. To overcome this problem, a System On Programmable Chip (SOPC) has been designed and developed using Field Programmable Gate Array (FPGA). A softcore processor, NIOS II 32-bit RISC, which is the microprocessor core was utilized in FPGA system together with the embedded operating system(OS), μClinux. In this paper, an example of web server is explained and demonstrated
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Howison, Mark; Bethel, E. Wes; Childs, Hank

2012-01-01

With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
Comparison of the effect of soft-core potentials and Coulombic potentials on bremsstrahlung during laser matter interaction

NASA Astrophysics Data System (ADS)

Pandit, Rishi R.; Becker, Valerie R.; Barrington, Kasey; Thurston, Jeremy; Ramunno, Lora; Ackad, Edward

2018-04-01

An intense, short laser pulse incident on rare-gas clusters can produce nano-plasmas containing energetic electrons. As these electrons undergo scattering, from both phonons and ions, they emit bremsstrahlung radiation. Here, we compare a theory of bremsstrahlung emission appropriate for the interaction of intense lasers with matter using soft-core potentials and Coulombic potentials. A new scaling for the radiation cross-section and the radiated power via bremsstrahlung is derived for a soft-core potential (which depends on the potential depth) and compared with the Coulomb potential. Calculations using the new scaling are performed for electrons in vacuum ultraviolet, infrared and mid-infrared laser pulses. The radiation cross-section and the radiation power via bremsstrahlung are found to increase rapidly with increases in the potential depth of up to around 200 eV and then become mostly saturated for larger depths while remaining constant for the Coulomb potential. In both cases, the radiation cross-section and the radiation power of bremsstrahlung decrease with increases in the laser wavelength. The ratio of the scattering amplitude for the soft-core potential and that for the Coulombic potential decreases exponentially with an increase in momentum transfer. The bremsstrahlung emission by electrons in plasmas may provide a broadband light source for diagnostics.
Effect of increased IIDR in the nucleus freedom cochlear implant system.

PubMed

Holden, Laura K; Skinner, Margaret W; Fourakis, Marios S; Holden, Timothy A

2007-10-01

The objective of this study was to evaluate the effect of the increased instantaneous input dynamic range (IIDR) in the Nucleus Freedom cochlear implant (CI) system on recipients' ability to perceive soft speech and speech in noise. Ten adult Freedom CI recipients participated. Two maps differing in IIDR were placed on each subject's processor at initial activation. The IIDR was set to 30 dB for one map and 40 dB for the other. Subjects used both maps for at least one month prior to speech perception testing. Results revealed significantly higher scores for words (50 dB SPL), for sentences in background babble (65 dB SPL), and significantly lower sound field threshold levels with the 40 compared to the 30 dB IIDR map. Ceiling effects may have contributed to non-significant findings for sentences in quiet (50 dB SPL). The Freedom's increased IIDR allows better perception of soft speech and speech in noise.
Many-core computing for space-based stereoscopic imaging

NASA Astrophysics Data System (ADS)

McCall, Paul; Torres, Gildo; LeGrand, Keith; Adjouadi, Malek; Liu, Chen; Darling, Jacob; Pernicka, Henry

The potential benefits of using parallel computing in real-time visual-based satellite proximity operations missions are investigated. Improvements in performance and relative navigation solutions over single thread systems can be achieved through multi- and many-core computing. Stochastic relative orbit determination methods benefit from the higher measurement frequencies, allowing them to more accurately determine the associated statistical properties of the relative orbital elements. More accurate orbit determination can lead to reduced fuel consumption and extended mission capabilities and duration. Inherent to the process of stereoscopic image processing is the difficulty of loading, managing, parsing, and evaluating large amounts of data efficiently, which may result in delays or highly time consuming processes for single (or few) processor systems or platforms. In this research we utilize the Single-Chip Cloud Computer (SCC), a fully programmable 48-core experimental processor, created by Intel Labs as a platform for many-core software research, provided with a high-speed on-chip network for sharing information along with advanced power management technologies and support for message-passing. The results from utilizing the SCC platform for the stereoscopic image processing application are presented in the form of Performance, Power, Energy, and Energy-Delay-Product (EDP) metrics. Also, a comparison between the SCC results and those obtained from executing the same application on a commercial PC are presented, showing the potential benefits of utilizing the SCC in particular, and any many-core platforms in general for real-time processing of visual-based satellite proximity operations missions.

CTF Preprocessor User's Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avramova, Maria; Salko, Robert K.

2016-05-26

This document describes how a user should go about using the CTF pre- processor tool to create an input deck for modeling rod-bundle geometry in CTF. The tool was designed to generate input decks in a quick and less error-prone manner for CTF. The pre-processor is a completely independent utility, written in Fortran, that takes a reduced amount of input from the user. The information that the user must supply is basic information on bundle geometry, such as rod pitch, clad thickness, and axial location of spacer grids--the pre-processor takes this basic information and determines channel placement and connection informationmore » to be written to the input deck, which is the most time-consuming and error-prone segment of creating a deck. Creation of the model is also more intuitive, as the user can specify assembly and water-tube placement using visual maps instead of having to place them by determining channel/channel and rod/channel connections. As an example of the benefit of the pre-processor, a quarter-core model that contains 500,000 scalar-mesh cells was read into CTF from an input deck containing 200,000 lines of data. This 200,000 line input deck was produced automatically from a set of pre-processor decks that contained only 300 lines of data.« less
Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

NASA Astrophysics Data System (ADS)

Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

2016-04-01

We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
A Case for Soft Error Detection and Correction in Computational Chemistry.

PubMed

van Dam, Hubertus J J; Vishnu, Abhinav; de Jong, Wibe A

2013-09-10

High performance computing platforms are expected to deliver 10(18) floating operations per second by the year 2022 through the deployment of millions of cores. Even if every core is highly reliable the sheer number of them will mean that the mean time between failures will become so short that most application runs will suffer at least one fault. In particular soft errors caused by intermittent incorrect behavior of the hardware are a concern as they lead to silent data corruption. In this paper we investigate the impact of soft errors on optimization algorithms using Hartree-Fock as a particular example. Optimization algorithms iteratively reduce the error in the initial guess to reach the intended solution. Therefore they may intuitively appear to be resilient to soft errors. Our results show that this is true for soft errors of small magnitudes but not for large errors. We suggest error detection and correction mechanisms for different classes of data structures. The results obtained with these mechanisms indicate that we can correct more than 95% of the soft errors at moderate increases in the computational cost.
Synthesis and magnetic properties of cobalt-iron/cobalt-ferrite soft/hard magnetic core/shell nanowires

NASA Astrophysics Data System (ADS)

Leandro Londoño-Calderón, César; Moscoso-Londoño, Oscar; Muraca, Diego; Arzuza, Luis; Carvalho, Peterson; Pirota, Kleber Roberto; Knobel, Marcelo; Pampillo, Laura Gabriela; Martínez-García, Ricardo

2017-06-01

A straightforward method for the synthesis of CoFe2.7/CoFe2O4 core/shell nanowires is described. The proposed method starts with a conventional pulsed electrodeposition procedure on alumina nanoporous template. The obtained CoFe2.7 nanowires are released from the template and allowed to oxidize at room conditions over several weeks. The effects of partial oxidation on the structural and magnetic properties were studied by x-ray spectrometry, magnetometry, and scanning and transmission electron microscopy. The results indicate that the final nanowires are composed of 5 nm iron-cobalt alloy nanoparticles. Releasing the nanowires at room conditions promoted surface oxidation of the nanoparticles and created a CoFe2O4 shell spinel-like structure. The shell avoids internal oxidation and promotes the formation of bi-magnetic soft/hard magnetic core/shell nanowires. The magnetic properties of both the initial single-phase CoFe2.7 nanowires and the final core/shell nanowires, reveal that the changes in the properties from the array are due to the oxidation more than effects associated with released processes (disorder and agglomeration).
Synthesis and magnetic properties of cobalt-iron/cobalt-ferrite soft/hard magnetic core/shell nanowires.

PubMed

Londoño-Calderón, César Leandro; Moscoso-Londoño, Oscar; Muraca, Diego; Arzuza, Luis; Carvalho, Peterson; Pirota, Kleber Roberto; Knobel, Marcelo; Pampillo, Laura Gabriela; Martínez-García, Ricardo

2017-06-16

A straightforward method for the synthesis of CoFe 2.7 /CoFe 2 O 4 core/shell nanowires is described. The proposed method starts with a conventional pulsed electrodeposition procedure on alumina nanoporous template. The obtained CoFe 2.7 nanowires are released from the template and allowed to oxidize at room conditions over several weeks. The effects of partial oxidation on the structural and magnetic properties were studied by x-ray spectrometry, magnetometry, and scanning and transmission electron microscopy. The results indicate that the final nanowires are composed of 5 nm iron-cobalt alloy nanoparticles. Releasing the nanowires at room conditions promoted surface oxidation of the nanoparticles and created a CoFe 2 O 4 shell spinel-like structure. The shell avoids internal oxidation and promotes the formation of bi-magnetic soft/hard magnetic core/shell nanowires. The magnetic properties of both the initial single-phase CoFe 2.7 nanowires and the final core/shell nanowires, reveal that the changes in the properties from the array are due to the oxidation more than effects associated with released processes (disorder and agglomeration).
Multicore Education through Simulation

ERIC Educational Resources Information Center

Ozturk, O.

2011-01-01

A project-oriented course for advanced undergraduate and graduate students is described for simulating multiple processor cores. Simics, a free simulator for academia, was utilized to enable students to explore computer architecture, operating systems, and hardware/software cosimulation. Motivation for including this course in the curriculum is…
Dense and Sparse Matrix Operations on the Cell Processor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel W.; Shalf, John; Oliker, Leonid

2005-05-01

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, usingmore » a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.« less
LC and ferromagnetic resonance in soft/hard magnetic microwires

NASA Astrophysics Data System (ADS)

Tian, Bin; Vazquez, Manuel

2015-12-01

The magnetic behavior of soft/hard biphase microwires is introduced here. The microwires consist of a Co59.1Fe14.8Si10.2B15.9 soft magnetic nucleus and a Co90Ni10 hard outer shell separated by an intermediate insulating Pyrex glass microtube. By comparing the resistance spectrums of welding the ends of metallic core (CC) or welding the metallic core and outer shell (CS) to the connector, it is found that one of the two peaks in the resistance spectrum is because the LC resonance depends on the inductor and capacitors in which one is the capacitor between the metallic core and outer shell, and the other is between the outer shell and connector. Correspondingly, another peak is for the ferromagnetic resonance of metallic core. After changing the capacitance of the capacitors, the frequency of LC resonance moves to high frequency band, and furthermore, the peak of LC resonance in the resistance spectrum disappeared. These magnetostatically coupled biphase systems are thought to be of large potential interest as sensing elements in sensor devices.
Effect of a core-softened O-O interatomic interaction on the shock compression of fused silica

NASA Astrophysics Data System (ADS)

Izvekov, Sergei; Weingarten, N. Scott; Byrd, Edward F. C.

2018-03-01

Isotropic soft-core potentials have attracted considerable attention due to their ability to reproduce thermodynamic, dynamic, and structural anomalies observed in tetrahedral network-forming compounds such as water and silica. The aim of the present work is to assess the relevance of effective core-softening pertinent to the oxygen-oxygen interaction in silica to the thermodynamics and phase change mechanisms that occur in shock compressed fused silica. We utilize the MD simulation method with a recently published numerical interatomic potential derived from an ab initio MD simulation of liquid silica via force-matching. The resulting potential indicates an effective shoulder-like core-softening of the oxygen-oxygen repulsion. To better understand the role of the core-softening we analyze two derivative force-matching potentials in which the soft-core is replaced with a repulsive core either in the three-body potential term or in all the potential terms. Our analysis is further augmented by a comparison with several popular empirical models for silica that lack an explicit core-softening. The first outstanding feature of shock compressed glass reproduced with the soft-core models but not with the other models is that the shock compression values at pressures above 20 GPa are larger than those observed under hydrostatic compression (an anomalous shock Hugoniot densification). Our calculations indicate the occurrence of a phase transformation along the shock Hugoniot that we link to the O-O repulsion core-softening. The phase transformation is associated with a Hugoniot temperature reversal similar to that observed experimentally. With the soft-core models, the phase change is an isostructural transformation between amorphous polymorphs with no associated melting event. We further examine the nature of the structural transformation by comparing it to the Hugoniot calculations for stishovite. For stishovite, the Hugoniot exhibits temperature reversal and associated phase transformation, which is a transition to a disordered phase (liquid or dense amorphous), regardless of whether or not the model accounts for core-softening. The onset pressures of the transformation predicted by different models show a wide scatter within 60-110 GPa; for potentials without core-softening, the onset pressure is much higher than 110 GPa. Our results show that the core-softening of the interaction in the oxygen subsystem of silica is the key mechanism for the structural transformation and thermodynamics in shock compressed silica. These results may provide an important contribution to a unified picture of anomalous response to shock compression observed in other network-forming oxides and single-component systems with core-softening of effective interactions.
Neural simulations on multi-core architectures.

PubMed

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing.
Neural Simulations on Multi-Core Architectures

PubMed Central

Eichner, Hubert; Klug, Tobias; Borst, Alexander

2009-01-01

Neuroscience is witnessing increasing knowledge about the anatomy and electrophysiological properties of neurons and their connectivity, leading to an ever increasing computational complexity of neural simulations. At the same time, a rather radical change in personal computer technology emerges with the establishment of multi-cores: high-density, explicitly parallel processor architectures for both high performance as well as standard desktop computers. This work introduces strategies for the parallelization of biophysically realistic neural simulations based on the compartmental modeling technique and results of such an implementation, with a strong focus on multi-core architectures and automation, i.e. user-transparent load balancing. PMID:19636393
A VHDL Core for Intrinsic Evolution of Discrete Time Filters with Signal Feedback

NASA Technical Reports Server (NTRS)

Gwaltney, David A.; Dutton, Kenneth

2005-01-01

The design of an Evolvable Machine VHDL Core is presented, representing a discrete-time processing structure capable of supporting control system applications. This VHDL Core is implemented in an FPGA and is interfaced with an evolutionary algorithm implemented in firmware on a Digital Signal Processor (DSP) to create an evolvable system platform. The salient features of this architecture are presented. The capability to implement IIR filter structures is presented along with the results of the intrinsic evolution of a filter. The robustness of the evolved filter design is tested and its unique characteristics are described.
Cache Sharing and Isolation Tradeoffs in Multicore Mixed-Criticality Systems

DTIC Science & Technology

2015-05-01

of lockdown registers, to provide way-based partitioning. These alternatives are illustrated in Fig. 1 with respect to a quad-core ARM Cortex A9...presented a cache-partitioning scheme that allows multiple tasks to share the same cache partition on a single processor (as we do for Level-A and...sets and determined the fraction that were schedulable on our target hardware platform, the quad-core ARM Cortex A9 machine mentioned earlier, the LLC
Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search

DTIC Science & Technology

2009-11-01

i.e., index construction may involve multiple flushes to local disk and on-disk merge sorts outside of MapReduce). Once the local indexes have been...contained 198 cores, which, with current dual -processor quad-core con- figurations, could fit into 25 machines—a far more modest cluster with today’s...signifi- cant impact on effectiveness. Our simple pruning technique was performed at query time and hence could be adapted to query-dependent
Graphics Processing Unit (GPU) Acceleration of the Goddard Earth Observing System Atmospheric Model

NASA Technical Reports Server (NTRS)

Putnam, Williama

2011-01-01

The Goddard Earth Observing System 5 (GEOS-5) is the atmospheric model used by the Global Modeling and Assimilation Office (GMAO) for a variety of applications, from long-term climate prediction at relatively coarse resolution, to data assimilation and numerical weather prediction, to very high-resolution cloud-resolving simulations. GEOS-5 is being ported to a graphics processing unit (GPU) cluster at the NASA Center for Climate Simulation (NCCS). By utilizing GPU co-processor technology, we expect to increase the throughput of GEOS-5 by at least an order of magnitude, and accelerate the process of scientific exploration across all scales of global modeling, including: The large-scale, high-end application of non-hydrostatic, global, cloud-resolving modeling at 10- to I-kilometer (km) global resolutions Intermediate-resolution seasonal climate and weather prediction at 50- to 25-km on small clusters of GPUs Long-range, coarse-resolution climate modeling, enabled on a small box of GPUs for the individual researcher After being ported to the GPU cluster, the primary physics components and the dynamical core of GEOS-5 have demonstrated a potential speedup of 15-40 times over conventional processor cores. Performance improvements of this magnitude reduce the required scalability of 1-km, global, cloud-resolving models from an unfathomable 6 million cores to an attainable 200,000 GPU-enabled cores.
Microcalorimeters with Germanium Thermistors for High Resolution Soft and Hard X-ray Astronomy

NASA Technical Reports Server (NTRS)

Silver, E.

2003-01-01

This is a progress report for the first year of a three year Space Research and Technology (SR&T) grant to continue the advancement of neutron transmutation doped (NTD-based) microcalorimeters. We have re-prioritized certain aspects of the statement of work and chose to emphasize issues of array development in the first year rather than wait until year two. Consequently, some of the projects scheduled for the first year were delayed to the second year. Here we report on our progress to: a) Build and test a 1 x 4 element array and to investigate electrical and thermal cross-talk; b) Build a multiplexed 4 channel analog pulse processor; c) Build a digital pulse processor that can accommodate 4 channels with independent triggers; d) Develop a proportional thermal baseline restoration system compatible with the constant voltage mode of microcalorimeter operation.
Electronic structure and soft-X-ray-induced photoreduction studies of iron-based magnetic polyoxometalates of type {(M)M5}12Fe(III)30 (M = Mo(VI), W(VI)).

PubMed

Kuepper, Karsten; Derks, Christine; Taubitz, Christian; Prinz, Manuel; Joly, Loïc; Kappler, Jean-Paul; Postnikov, Andrei; Yang, Wanli; Kuznetsova, Tatyana V; Wiedwald, Ulf; Ziemann, Paul; Neumann, Manfred

2013-06-14

Giant Keplerate-type molecules with a {Mo72Fe30} core show a number of very interesting properties, making them particularly promising for various applications. So far, only limited data on the electronic structure of these molecules from X-ray spectra and electronic structure calculations have been available. Here we present a combined electronic and magnetic structure study of three Keplerate-type nanospheres--two with a {Mo72Fe30} core and one with a {W72Fe30} core by means of X-ray absorption spectroscopy, X-ray magnetic circular dichroism (XMCD), SQUID magnetometry, and complementary theoretical approaches. Furthermore, we present detailed studies of the Fe(3+)-to-Fe(2+) photoreduction process, which is induced under soft X-ray radiation in these molecules. We observe that the photoreduction rate greatly depends on the ligand structure surrounding the Fe ions, with negatively charged ligands leading to a dramatically reduced photoreduction rate. This opens the possibility of tailoring such polyoxometalates by X-ray spectroscopic studies and also for potential applications in the field of X-ray induced photochemistry.
Accelerating Demand Paging for Local and Remote Out-of-Core Visualization

NASA Technical Reports Server (NTRS)

Ellsworth, David

2001-01-01

This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware.

PubMed

Lommen, Arjen; Kools, Harrie J

2012-08-01

A new, multi-threaded version of the GC-MS and LC-MS data processing software, metAlign, has been developed which is able to utilize multiple cores on one PC. This new version was tested using three different multi-core PCs with different operating systems. The performance of noise reduction, baseline correction and peak-picking was 8-19 fold faster compared to the previous version on a single core machine from 2008. The alignment was 5-10 fold faster. Factors influencing the performance enhancement are discussed. Our observations show that performance scales with the increase in processor core numbers we currently see in consumer PC hardware development.

Analysis and Implementation of Particle-to-Particle (P2P) Graphics Processor Unit (GPU) Kernel for Black-Box Adaptive Fast Multipole Method

DTIC Science & Technology

2015-06-01

5110P and 16 dx360M4 nodes each with one NVIDIA Kepler K20M/K40M GPU. Each node contained dual Intel Xeon E5-2670 (Sandy Bridge) central processing...kernel and as such does not employ multiple processors. This work makes use of a single processing core and a single NVIDIA Kepler K40 GK110...bandwidth (2 × 16 slot), 7.877 GFloat/s; Kepler K40 peak, 4,290 × 1 billion floating-point operations (GFLOPs), and 288 GB/s Kepler K40 memory
Advances in photographic X-ray imaging for solar astronomy

NASA Technical Reports Server (NTRS)

Moses, J. Daniel; Schueller, R.; Waljeski, K.; Davis, John M.

1989-01-01

The technique of obtaining quantitative data from high resolution soft X-ray photographic images produced by grazing incidence optics was successfully developed to a high degree during the Solar Research Sounding Rocket Program and the S-054 X-Ray Spectrographic Telescope Experiment Program on Skylab. Continued use of soft X-ray photographic imaging in sounding rocket flights of the High Resolution Solar Soft X-Ray Imaging Payload has provided opportunities to further develop these techniques. The developments discussed include: (1) The calibration and use of an inexpensive, commercially available microprocessor controlled drum type film processor for photometric film development; (2) The use of Kodak Technical Pan 2415 film and Kodak SO-253 High Speed Holographic film for improved resolution; and (3) The application of a technique described by Cook, Ewing, and Sutton for determining the film characteristics curves from density histograms of the flight film. Although the superior sensitivity, noise level, and linearity of microchannel plate and CCD detectors attracts the development efforts of many groups working in soft X-ray imaging, the high spatial resolution and dynamic range as well as the reliability and ease of application of photographic media assures the continued use of these techniques in solar X-ray astronomy observations.
A Parallel Saturation Algorithm on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Ezekiel, Jonathan; Siminiceanu

2007-01-01

Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Programmable Phase Transitions in a Photonic Microgel System: Linking Soft Interactions to a Temporal pH Gradient.

PubMed

Go, Dennis; Rommel, Dirk; Chen, Lisa; Shi, Feng; Sprakel, Joris; Kuehne, Alexander J C

2017-02-28

Soft amphoteric microgel systems exhibit a rich phase behavior. Crystalline phases of these material systems are of interest because they exhibit photonic stop-gaps, giving rise to iridescent color. Such microgel systems are promising for applications in soft, switchable, and programmable photonic filters and devices. We here report a composite microgel system consisting of a hard and fluorescently labeled core and a soft, amphoteric microgel shell. At pH above the isoelectric point (IEP), these colloids easily crystallize into three-dimensional colloidal assemblies. By adding a cyclic lactone to the system, the temporal pH profile can be controlled, and the microgels can be programmed to melt, while they lose charge. When the microgels gain the opposite charge, they recrystallize into assemblies of even higher order. We provide a model system to study the dynamic phase behavior of soft particles and their switchable and programmable photonic effects.
Microbial Community Response to Simulated Petroleum Seepage in Caspian Sea Sediments

PubMed Central

Stagars, Marion H.; Mishra, Sonakshi; Treude, Tina; Amann, Rudolf; Knittel, Katrin

2017-01-01

Anaerobic microbial hydrocarbon degradation is a major biogeochemical process at marine seeps. Here we studied the response of the microbial community to petroleum seepage simulated for 190 days in a sediment core from the Caspian Sea using a sediment-oil-flow-through (SOFT) system. Untreated (without simulated petroleum seepage) and SOFT sediment microbial communities shared 43% bacterial genus-level 16S rRNA-based operational taxonomic units (OTU0.945) but shared only 23% archaeal OTU0.945. The community differed significantly between sediment layers. The detection of fourfold higher deltaproteobacterial cell numbers in SOFT than in untreated sediment at depths characterized by highest sulfate reduction rates and strongest decrease of gaseous and mid-chain alkane concentrations indicated a specific response of hydrocarbon-degrading Deltaproteobacteria. Based on an increase in specific CARD-FISH cell numbers, we suggest the following groups of sulfate-reducing bacteria to be likely responsible for the observed decrease in aliphatic and aromatic hydrocarbon concentration in SOFT sediments: clade SCA1 for propane and butane degradation, clade LCA2 for mid- to long-chain alkane degradation, clade Cyhx for cycloalkanes, pentane and hexane degradation, and relatives of Desulfobacula for toluene degradation. Highest numbers of archaea of the genus Methanosarcina were found in the methanogenic zone of the SOFT core where we detected preferential degradation of long-chain hydrocarbons. Sequencing of masD, a marker gene for alkane degradation encoding (1-methylalkyl)succinate synthase, revealed a low diversity in SOFT sediment with two abundant species-level MasD OTU0.96. PMID:28503173
Photochemistry on soft-glass hollow-core photonic crystal fibre

NASA Astrophysics Data System (ADS)

Cubillas, Ana M.; Jiang, Xin; Euser, Tijmen G.; Taccardi, Nicola; Etzold, Bastian J. M.; Wasserscheid, Peter; Russell, Philip St. J.

2014-05-01

Hollow-core photonic crystal fibre (HC-PCF) offers strong light confinement and long interaction lengths in an optofluidic channel. These unique advantages have motivated its recent use as a highly efficient and versatile microreactor for liquid-phase photochemistry and catalysis. In this work, we use a soft-glass HC-PCF to carry out photochemical experiments in a high-index solvent such as toluene. The high-intensity and strong confinement in the fibre is demonstrated to enhance the performance of a proof-of-principle photolysis reaction.
Quantum phases of dipolar soft-core bosons

NASA Astrophysics Data System (ADS)

Grimmer, D.; Safavi-Naini, A.; Capogrosso-Sansone, B.; Söyler, Ş. G.

2014-10-01

We study the phase diagram of a system of soft-core dipolar bosons confined to a two-dimensional optical lattice layer. We assume that dipoles are aligned perpendicular to the layer such that the dipolar interactions are purely repulsive and isotropic. We consider the full dipolar interaction and perform path-integral quantum Monte Carlo simulations using the worm algorithm. Besides a superfluid phase, we find various solid and supersolid phases. We show that, unlike what was found previously for the case of nearest-neighbor interaction, supersolid phases are stabilized by doping the solids not only with particles but with holes as well. We further study the stability of these quantum phases against thermal fluctuations. Finally, we discuss pair formation and the stability of the pair checkerboard phase formed in a bilayer geometry, and we suggest experimental conditions under which the pair checkerboard phase can be observed.
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors

PubMed Central

Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun

2016-01-01

Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction. PMID:27399722
IGA-ADS: Isogeometric analysis FEM using ADS solver

NASA Astrophysics Data System (ADS)

Łoś, Marcin M.; Woźniak, Maciej; Paszyński, Maciej; Lenharth, Andrew; Hassaan, Muhamm Amber; Pingali, Keshav

2017-08-01

In this paper we present a fast explicit solver for solution of non-stationary problems using L2 projections with isogeometric finite element method. The solver has been implemented within GALOIS framework. It enables parallel multi-core simulations of different time-dependent problems, in 1D, 2D, or 3D. We have prepared the solver framework in a way that enables direct implementation of the selected PDE and corresponding boundary conditions. In this paper we describe the installation, implementation of exemplary three PDEs, and execution of the simulations on multi-core Linux cluster nodes. We consider three case studies, including heat transfer, linear elasticity, as well as non-linear flow in heterogeneous media. The presented package generates output suitable for interfacing with Gnuplot and ParaView visualization software. The exemplary simulations show near perfect scalability on Gilbert shared-memory node with four Intel® Xeon® CPU E7-4860 processors, each possessing 10 physical cores (for a total of 40 cores).
UAS-NAS Live Virtual Constructive Distributed Environment (LVC): LVC Gateway, Gateway Toolbox, Gateway Data Logger (GDL), SaaProc Software Design Description

NASA Technical Reports Server (NTRS)

Jovic, Srboljub

2015-01-01

This document provides the software design description for the two core software components, the LVC Gateway, the LVC Gateway Toolbox, and two participants, the LVC Gateway Data Logger and the SAA Processor (SaaProc).
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

NASA Astrophysics Data System (ADS)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Cache Sharing and Isolation Tradeoffs in Multicore Mixed-Criticality Systems

DTIC Science & Technology

2015-05-01

form of lockdown registers, to provide way-based partitioning. These alternatives are illustrated in Fig. 1 with respect to a quad-core ARM Cortex A9... processor (as we do for Level-A and -B tasks), but they did not consider MC systems. Altmeyer et al. [1] considered uniprocessor scheduling on a system with a...framework. We randomly generated task sets and determined the fraction that were schedulable on our target hardware platform, the quad-core ARM Cortex A9
Floating-point performance of ARM cores and their efficiency in classical molecular dynamics

NASA Astrophysics Data System (ADS)

Nikolskiy, V.; Stegailov, V.

2016-02-01

Supercomputing of the exascale era is going to be inevitably limited by power efficiency. Nowadays different possible variants of CPU architectures are considered. Recently the development of ARM processors has come to the point when their floating point performance can be seriously considered for a range of scientific applications. In this work we present the analysis of the floating point performance of the latest ARM cores and their efficiency for the algorithms of classical molecular dynamics.
FPGA-based real-time embedded system for RISS/GPS integrated navigation.

PubMed

Abdelfatah, Walid Farid; Georgy, Jacques; Iqbal, Umar; Noureldin, Aboelmagd

2012-01-01

Navigation algorithms integrating measurements from multi-sensor systems overcome the problems that arise from using GPS navigation systems in standalone mode. Algorithms which integrate the data from 2D low-cost reduced inertial sensor system (RISS), consisting of a gyroscope and an odometer or wheel encoders, along with a GPS receiver via a Kalman filter has proved to be worthy in providing a consistent and more reliable navigation solution compared to standalone GPS receivers. It has been also shown to be beneficial, especially in GPS-denied environments such as urban canyons and tunnels. The main objective of this paper is to narrow the idea-to-implementation gap that follows the algorithm development by realizing a low-cost real-time embedded navigation system capable of computing the data-fused positioning solution. The role of the developed system is to synchronize the measurements from the three sensors, relative to the pulse per second signal generated from the GPS, after which the navigation algorithm is applied to the synchronized measurements to compute the navigation solution in real-time. Employing a customizable soft-core processor on an FPGA in the kernel of the navigation system, provided the flexibility for communicating with the various sensors and the computation capability required by the Kalman filter integration algorithm.
FPGA-Based Real-Time Embedded System for RISS/GPS Integrated Navigation

PubMed Central

Abdelfatah, Walid Farid; Georgy, Jacques; Iqbal, Umar; Noureldin, Aboelmagd

2012-01-01

Navigation algorithms integrating measurements from multi-sensor systems overcome the problems that arise from using GPS navigation systems in standalone mode. Algorithms which integrate the data from 2D low-cost reduced inertial sensor system (RISS), consisting of a gyroscope and an odometer or wheel encoders, along with a GPS receiver via a Kalman filter has proved to be worthy in providing a consistent and more reliable navigation solution compared to standalone GPS receivers. It has been also shown to be beneficial, especially in GPS-denied environments such as urban canyons and tunnels. The main objective of this paper is to narrow the idea-to-implementation gap that follows the algorithm development by realizing a low-cost real-time embedded navigation system capable of computing the data-fused positioning solution. The role of the developed system is to synchronize the measurements from the three sensors, relative to the pulse per second signal generated from the GPS, after which the navigation algorithm is applied to the synchronized measurements to compute the navigation solution in real-time. Employing a customizable soft-core processor on an FPGA in the kernel of the navigation system, provided the flexibility for communicating with the various sensors and the computation capability required by the Kalman filter integration algorithm. PMID:22368460
Socket Preservation using Enzyme-treated Equine Bone Granules and an Equine Collagen Matrix: A Case Report with Histological and Histomorphometrical Assessment.

PubMed

Leonida, Alessandro; Todeschini, Giovanni; Lomartire, Giovanni; Cinci, Lorenzo; Pieri, Laura

2016-11-01

To histologically assess the effectiveness of a socket-preservation technique using enzyme-treated equine bone granules as a bone-graft material in combination with an equine collagen matrix as a scaffold for soft-tissue regeneration. Enzyme-treated equine bone granules and equine collagen matrix recently have been developed to help overcome alveolar bone deficiencies that develop in the wake of edentulism. The patient had one mandibular molar extracted and the socket grafted with equine bone granules. The graft was covered with the equine collagen matrix, placed in a double layer. No flap was prepared, and the gingival margins were stabilized with a single stitch, leaving the matrix partially exposed and the site to heal by secondary intention. The adjacent molar was extracted 1 month later, and that socket was left to heal by secondary intention without any further treatment. Three months after each surgery, an implant was placed and a biopsy was collected. The two biopsies underwent histological processing and qualitative evaluation. Histomorphometric analysis was also performed to calculate the percentage of newly formed bone (NFB) in the two cores. Healing at both sites was uneventful, and no inflammation or other adverse reactions were observed in the samples. Soft-tissue healing by secondary intention appeared to occur faster at the grafted site. The corresponding core showed a marked separation between soft and hard tissue that was not observed in the core from the nongrafted site, where soft-tissue hypertrophy could be observed. Newly formed bone at the grafted and nongrafted sites was not significantly different (27.2 ± 7.1 and 29.4 ± 6.2% respectively, p = 0.45). The surgical technique employed in this case appeared to facilitate postextraction soft-tissue healing by second intention and simplify soft-tissue management. Using a collagen-based matrix to cover a postextraction grafted site may facilitate second intention soft-tissue healing and proper soft-tissue growth.
Antidumping Action in the United States and Around the World: An Analysis of International Data. CBO Paper.

DTIC Science & Technology

1998-06-01

the price of sugar to the point that some processors, such as soft-drink producers, have replaced it with high - fructose corn syrup . That example...United States, both one on one and in the aggregate. Antidumping duty rates are high enough to be significant impediments to trade, especially the duties...the United States, although their rates are still high enough to be significant impediments to trade. Among the most active users, Canada had the next
Improved multi-stage neonatal seizure detection using a heuristic classifier and a data-driven post-processor.

PubMed

Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S

2016-09-01

After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
An embedded multi-core parallel model for real-time stereo imaging

NASA Astrophysics Data System (ADS)

He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu

2018-04-01

The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
Characteristic Examination of New Synchronous Motor that Composes Craw Teeth of Soft Magnetic Composite

NASA Astrophysics Data System (ADS)

Enomoto, Yuji; Ito, Motoya; Masaki, Ryozo; Asaka, Kazuo

We examined the claw type teeth motor as one application of the soft magnetic composite to a motor core. In order to understand quantitatively the characteristics of the claw type teeth motor, we used the 3-dimensional electromagnetic field analysis to predict its characteristics in advance and manufactured a trial motor to estimate it. And we examined the advantages of the claw type teeth motor comparing with a conventional slot type motor. The results are: 1. By using the 3-dimensional electromagnetic field analysis, it is able to estimate with high accuracy the characteristics of the 3-phase permanent magnet synchronous claw type teeth motor having a core composed of the soft magnetic composite. 2. The claw type teeth motor is able to achieve about 20% higher output than a conventional slot type motor having an electromagnetic steel core, while both volumes are equal. 3. The motor efficiency of the claw type teeth motor is about 3.5% higher than the conventional motor.

Size-dependent structural evolution of the biomineralized iron-core nanoparticles in ferritins

NASA Astrophysics Data System (ADS)

Lee, Eunsook; Kim, D. H.; Hwang, Jihoon; Lee, Kiho; Yoon, Sungwon; Suh, B. J.; Hyun Kim, Kyung; Kim, J.-Y.; Jang, Z. H.; Kim, Bongjae; Min, B. I.; Kang, J.-S.

2013-04-01

The structural identity of the biomineralized iron core nanoparticles in Helicobacter pylori ferritins (Hpf's) has been determined by employing soft x-ray absorption spectroscopy and soft x-ray magnetic circular dichroism. Valence states of Fe ions are nearly trivalent in all Hpf's, indicating that the amount of magnetite (Fe3O4) is negligible. With increasing filling of Fe ions, the local configurations of Fe3+ ions change from the mixture of the tetrahedral and octahedral symmetries to the octahedral symmetry. These results demonstrate that the biomineralization of the ferritin core changes from maghemite-like (γ-Fe2O3) formation to hematite-like (α-Fe2O3) formation with increasing Fe content.
Energy-aware Thread and Data Management in Heterogeneous Multi-core, Multi-memory Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Su, Chun-Yi

By 2004, microprocessor design focused on multicore scaling—increasing the number of cores per die in each generation—as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitivemore » or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems.« less
Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerbyson, Darren J; Lang, Michael; Pakin, Scott

2009-01-01

Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance. Processor-cores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contain wave-front processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundary data downstream and whose cost ismore » typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional computation and higher use of on-chip communications. This tradeoff is explored using a performance model and an implementation on the Petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in system communication performance exists.« less
CoreTSAR: Core Task-Size Adapting Runtime

DOE PAGES

Scogland, Thomas R. W.; Feng, Wu-chun; Rountree, Barry; ...

2014-10-27

Heterogeneity continues to increase at all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other co-processors into everything from desktops to supercomputers. As a consequence, efficiently managing such disparate resources has become increasingly complex. CoreTSAR seeks to reduce this complexity by adaptively worksharing parallel-loop regions across compute resources without requiring any transformation of the code within the loop. Lastly, our results show performance improvements of up to three-fold over a current state-of-the-art heterogeneous task scheduler as well as linear performance scaling from a single GPU to four GPUs for many codes. In addition, CoreTSAR demonstratesmore » a robust ability to adapt to both a variety of workloads and underlying system configurations.« less
A Survey of Recent MARTe Based Systems

NASA Astrophysics Data System (ADS)

Neto, André C.; Alves, Diogo; Boncagni, Luca; Carvalho, Pedro J.; Valcarcel, Daniel F.; Barbalace, Antonio; De Tommasi, Gianmaria; Fernandes, Horácio; Sartori, Filippo; Vitale, Enzo; Vitelli, Riccardo; Zabeo, Luca

2011-08-01

The Multithreaded Application Real-Time executor (MARTe) is a data driven framework environment for the development and deployment of real-time control algorithms. The main ideas which led to the present version of the framework were to standardize the development of real-time control systems, while providing a set of strictly bounded standard interfaces to the outside world and also accommodating a collection of facilities which promote the speed and ease of development, commissioning and deployment of such systems. At the core of every MARTe based application, is a set of independent inter-communicating software blocks, named Generic Application Modules (GAM), orchestrated by a real-time scheduler. The platform independence of its core library provides MARTe the necessary robustness and flexibility for conveniently testing applications in different environments including non-real-time operating systems. MARTe is already being used in several machines, each with its own peculiarities regarding hardware interfacing, supervisory control configuration, operating system and target control application. This paper presents and compares the most recent results of systems using MARTe: the JET Vertical Stabilization system, which uses the Real Time Application Interface (RTAI) operating system on Intel multi-core processors; the COMPASS plasma control system, driven by Linux RT also on Intel multi-core processors; ISTTOK real-time tomography equilibrium reconstruction which shares the same support configuration of COMPASS; JET error field correction coils based on VME, PowerPC and VxWorks; FTU LH reflected power system running on VME, Intel with RTAI.
Cost efficient CFD simulations: Proper selection of domain partitioning strategies

NASA Astrophysics Data System (ADS)

Haddadi, Bahram; Jordan, Christian; Harasek, Michael

2017-10-01

Computational Fluid Dynamics (CFD) is one of the most powerful simulation methods, which is used for temporally and spatially resolved solutions of fluid flow, heat transfer, mass transfer, etc. One of the challenges of Computational Fluid Dynamics is the extreme hardware demand. Nowadays super-computers (e.g. High Performance Computing, HPC) featuring multiple CPU cores are applied for solving-the simulation domain is split into partitions for each core. Some of the different methods for partitioning are investigated in this paper. As a practical example, a new open source based solver was utilized for simulating packed bed adsorption, a common separation method within the field of thermal process engineering. Adsorption can for example be applied for removal of trace gases from a gas stream or pure gases production like Hydrogen. For comparing the performance of the partitioning methods, a 60 million cell mesh for a packed bed of spherical adsorbents was created; one second of the adsorption process was simulated. Different partitioning methods available in OpenFOAM® (Scotch, Simple, and Hierarchical) have been used with different numbers of sub-domains. The effect of the different methods and number of processor cores on the simulation speedup and also energy consumption were investigated for two different hardware infrastructures (Vienna Scientific Clusters VSC 2 and VSC 3). As a general recommendation an optimum number of cells per processor core was calculated. Optimized simulation speed, lower energy consumption and consequently the cost effects are reported here.
Lifetime-vibrational interference effects in resonantly excited x-ray emission spectra of CO

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skytt, P.; Glans, P.; Gunnelin, K.

1997-04-01

The parity selection rule for resonant X-ray emission as demonstrated for O{sub 2} and N{sub 2} can be seen as an effect of interference between coherently excited degenerate localized core states. One system where the core state degeneracy is not exact but somewhat lifted was previously studied at ALS, namely the resonant X-ray emission of amino-substituted benzene (aniline). It was shown that the X-ray fluorescence spectrum resulting from excitation of the C1s at the site of the {open_quotes}aminocarbon{close_quotes} could be described in a picture separating the excitation and the emission processes, whereas the spectrum corresponding to the quasi-degenerate carbons couldmore » not. Thus, in this case it was necessary to take interference effects between the quasi-degenerate intermediate core excited states into account in order to obtain agreement between calculations and experiment. The different vibrational levels of core excited states in molecules have energy splittings which are of the same order of magnitude as the natural lifetime broadening of core excitations in the soft X-ray range. Therefore, lifetime-vibrational interference effects are likely to appear and influence the band shapes in resonant X-ray emission spectra. Lifetime-vibrational interference has been studied in non-resonant X-ray emission, and in Auger spectra. In this report the authors discuss results of selectively excited soft X-ray fluorescence spectra of molecules, where they focus on lifetime-interference effects appearing in the band shapes.« less
Magnetization dynamics of imprinted non-collinear spin textures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Streubel, Robert, E-mail: r.streubel@ifw-dresden.de; Kopte, Martin; Makarov, Denys, E-mail: d.makarov@ifw-dresden.de

2015-09-14

We study the magnetization dynamics of non-collinear spin textures realized via imprint of the magnetic vortex state in soft permalloy into magnetically hard out-of-plane magnetized Co/Pd nanopatterned heterostructures. Tuning the interlayer exchange coupling between soft- and hard-magnetic subsystems provides means to tailor the magnetic state in the Co/Pd stack from being vortex- to donut-like with different core sizes. While the imprinted vortex spin texture leads to the dynamics similar to the one observed for vortices in permalloy disks, the donut-like state causes the appearance of two gyrofrequencies characteristic of the early and later stages of the magnetization dynamics. The dynamicsmore » are described using the Thiele equation supported by the full scale micromagnetic simulations by taking into account an enlarged core size of the donut states compared to magnetic vortices.« less
An accuracy aware low power wireless EEG unit with information content based adaptive data compression.

PubMed

Tolbert, Jeremy R; Kabali, Pratik; Brar, Simeranjit; Mukhopadhyay, Saibal

2009-01-01

We present a digital system for adaptive data compression for low power wireless transmission of Electroencephalography (EEG) data. The proposed system acts as a base-band processor between the EEG analog-to-digital front-end and RF transceiver. It performs a real-time accuracy energy trade-off for multi-channel EEG signal transmission by controlling the volume of transmitted data. We propose a multi-core digital signal processor for on-chip processing of EEG signals, to detect signal information of each channel and perform real-time adaptive compression. Our analysis shows that the proposed approach can provide significant savings in transmitter power with minimal impact on the overall signal accuracy.
Impact of metal gates on remote phonon scattering in titanium nitride/hafnium dioxide n-channel metal-oxide-semiconductor field effect transistors-low temperature electron mobility study

NASA Astrophysics Data System (ADS)

Maitra, Kingsuk; Frank, Martin M.; Narayanan, Vijay; Misra, Veena; Cartier, Eduard A.

2007-12-01

We report low temperature (40-300 K) electron mobility measurements on aggressively scaled [equivalent oxide thickness (EOT)=1 nm] n-channel metal-oxide-semiconductor field effect transistors (nMOSFETs) with HfO2 gate dielectrics and metal gate electrodes (TiN). A comparison is made with conventional nMOSFETs containing HfO2 with polycrystalline Si (poly-Si) gate electrodes. No substantial change in the temperature acceleration factor is observed when poly-Si is replaced with a metal gate, showing that soft optical phonons are not significantly screened by metal gates. A qualitative argument based on an analogy between remote phonon scattering and high-resolution electron energy-loss spectroscopy (HREELS) is provided to explain the underlying physics of the observed phenomenon. It is also shown that soft optical phonon scattering is strongly damped by thin SiO2 interface layers, such that room temperature electron mobility values at EOT=1 nm become competitive with values measured in nMOSFETs with SiON gate dielectrics used in current high performance processors.
Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code

NASA Astrophysics Data System (ADS)

Hadade, Ioan; di Mare, Luca

2016-08-01

Modern multicore and manycore processors exhibit multiple levels of parallelism through a wide range of architectural features such as SIMD for data parallel execution or threads for core parallelism. The exploitation of multi-level parallelism is therefore crucial for achieving superior performance on current and future processors. This paper presents the performance tuning of a multiblock CFD solver on Intel SandyBridge and Haswell multicore CPUs and the Intel Xeon Phi Knights Corner coprocessor. Code optimisations have been applied on two computational kernels exhibiting different computational patterns: the update of flow variables and the evaluation of the Roe numerical fluxes. We discuss at great length the code transformations required for achieving efficient SIMD computations for both kernels across the selected devices including SIMD shuffles and transpositions for flux stencil computations and global memory transformations. Core parallelism is expressed through threading based on a number of domain decomposition techniques together with optimisations pertaining to alleviating NUMA effects found in multi-socket compute nodes. Results are correlated with the Roofline performance model in order to assert their efficiency for each distinct architecture. We report significant speedups for single thread execution across both kernels: 2-5X on the multicore CPUs and 14-23X on the Xeon Phi coprocessor. Computations at full node and chip concurrency deliver a factor of three speedup on the multicore processors and up to 24X on the Xeon Phi manycore coprocessor.
Real-time digital filtering, event triggering, and tomographic reconstruction of JET soft x-ray data (abstract)

NASA Astrophysics Data System (ADS)

Edwards, A. W.; Blackler, K.; Gill, R. D.; van der Goot, E.; Holm, J.

1990-10-01

Based upon the experience gained with the present soft x-ray data acquisition system, new techniques are being developed which make extensive use of digital signal processors (DSPs). Digital filters make 13 further frequencies available in real time from the input sampling frequency of 200 kHz. In parallel, various algorithms running on further DSPs generate triggers in response to a range of events in the plasma. The sawtooth crash can be detected, for example, with a delay of only 50 μs from the onset of the collapse. The trigger processor interacts with the digital filter boards to ensure data of the appropriate frequency is recorded throughout a plasma discharge. An independent link is used to pass 780 and 24 Hz filtered data to a network of transputers. A full tomographic inversion and display of the 24 Hz data is carried out in real time using this 15 transputer array. The 780 Hz data are stored for immediate detailed playback following the pulse. Such a system could considerably improve the quality of present plasma diagnostic data which is, in general, sampled at one fixed frequency throughout a discharge. Further, it should provide valuable information towards designing diagnostic data acquisition systems for future long pulse operation machines when a high degree of real-time processing will be required, while retaining the ability to detect, record, and analyze events of interest within such long plasma discharges.
Iterative current mode per pixel ADC for 3D SoftChip implementation in CMOS

NASA Astrophysics Data System (ADS)

Lachowicz, Stefan W.; Rassau, Alexander; Lee, Seung-Minh; Eshraghian, Kamran; Lee, Mike M.

2003-04-01

Mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. The processing requirements for the capture, conversion, compression, decompression, enhancement, display, etc. of increasingly higher quality multimedia content places heavy demands even on current ULSI (ultra large scale integration) systems, particularly for mobile applications where area and power are primary considerations. The ADC presented in this paper is designed for a vertically integrated (3D) system comprising two distinct layers bonded together using Indium bump technology. The top layer is a CMOS imaging array containing analogue-to-digital converters, and a buffer memory. The bottom layer takes the form of a configurable array processor (CAP), a highly parallel array of soft programmable processors capable of carrying out complex processing tasks directly on data stored in the top plane. This paper presents a ADC scheme for the image capture plane. The analogue photocurrent or sampled voltage is transferred to the ADC via a column or a column/row bus. In the proposed system, an array of analogue-to-digital converters is distributed, so that a one-bit cell is associated with one sensor. The analogue-to-digital converters are algorithmic current-mode converters. Eight such cells are cascaded to form an 8-bit converter. Additionally, each photo-sensor is equipped with a current memory cell, and multiple conversions are performed with scaled values of the photocurrent for colour processing.
Epidemiologic study of soft tissue rheumatism in Shantou and Taiyuan, China.

PubMed

Zeng, Qing-yu; Zang, Chang-hai; Lin, Ling; Chen, Su-biao; Li, Xiao-feng; Xiao, Zheng-yu; Dong, Hai-yuan; Zhang, Ai-lian; Chen, Ren

2010-08-05

Soft tissue rheumatism is a group of common rheumatic disorders reported in many countries. For investigating the prevalence rate of soft tissue rheumatism in different population in China, we carried out a population study in Shantou rural and Taiyuan urban area. Samples of 3915 adults in an urban area of Taiyuan, Shanxi Province, and 2350 in a rural area of Shantou, Guangdong Province were surveyed. Modified International League of Association for Rheumatology (ILAR)-Asia Pacific League of Association for Rheumatology (APLAR) Community Oriented Program for Control of Rheumatic Diseases (COPCORD) core questionnaire was implemented as screening tool. The positive responders were then all examined by rheumatologists. Prevalence rate of soft tissue rheumatism was 2.0% in Taiyuan, and 5.3% in Shantou. Rotator cuff (shoulder) tendinitis, adhesive capsulitis (frozen shoulder), lateral epicondylitis (tennis elbow), and digital flexor tenosynovitis (trigger finger) were the commonly seen soft tissue rheumatism in both areas. Tatarsalgia, plantar fasciitis, and De Quervain's tenosynovitis were more commonly seen in Shantou than that in Taiyuan. Only 1 case of fibromyalgia was found in Taiyuan and 2 cases in Shantou. The prevalence of soft tissue rheumatism varied with age, sex and occupation. Soft tissue rheumatism is common in Taiyuan and Shantou, China. The prevalence of soft tissue rheumatism was quite different with different geographic, environmental, and socioeconomic conditions; and varying with age, sex, and occupation. The prevalence of fibromyalgia is low in the present survey.
Effect of the addition of Al2O3 nanoparticles on the magnetic properties of Fe soft magnetic composites

NASA Astrophysics Data System (ADS)

Peng, Yuandong; Nie, Junwu; Zhang, Wenjun; Ma, Jian; Bao, Chongxi; Cao, Yang

2016-02-01

We investigated the effect of the addition of Al2O3 nanoparticles on the permeability and core loss of Fe soft magnetic composites coated with silicone. Fourier transform infra-red spectroscopy, scanning electron microscopy and energy-dispersive X-ray spectroscopy analysis revealed that the surface layer of the powder particles consisted of a thin insulating Al2O3 layer with uniform surface coverage. The permeability and core loss of the composite with the Al2O3 addition annealed at 650 °C were excellent. The results indicated that the Al2O3 nanoparticle addition increases the permeability stablility with changing frequency and decreases the core loss over a wide range of frequencies.
Insulator coated magnetic nanoparticulate composites with reduced core loss and method of manufacture thereof

NASA Technical Reports Server (NTRS)

Zhang, Yide (Inventor); Wang, Shihe (Inventor); Xiao, Danny (Inventor)

2004-01-01

A series of bulk-size magnetic/insulating nanostructured composite soft magnetic materials with significantly reduced core loss and its manufacturing technology. This insulator coated magnetic nanostructured composite is comprises a magnetic constituent, which contains one or more magnetic components, and an insulating constituent. The magnetic constituent is nanometer scale particles (1-100 nm) coated by a thin-layered insulating phase (continuous phase). While the intergrain interaction between the immediate neighboring magnetic nanoparticles separated by the insulating phase (or coupled nanoparticles) provide the desired soft magnetic properties, the insulating material provides the much demanded high resistivity which significantly reduces the eddy current loss. The resulting material is a high performance magnetic nanostructured composite with reduced core loss.
Peregrine System | High-Performance Computing | NREL

Science.gov Websites

) and longer-term (/projects) storage. These file systems are mounted on all nodes. Peregrine has three -2670 Xeon processors and 64 GB of memory. In addition to mounting the /home, /nopt, /projects and # cores/node Memory/node Peak (DP) performance per node 88 Intel Xeon E5-2670 "Sandy Bridge" 8
Investigation of Large Scale Cortical Models on Clustered Multi-Core Processors

DTIC Science & Technology

2013-02-01

with the bias node ( gray ) denoted as ww and the weights associated with the remaining first layer nodes (black) denoted as W. In forming the overall...Implementation of RBF network on GPU Platform 3.5.1 The Cholesky decomposition algorithm We need to invert the matrix multiplication GTG to
Improving the performance of heterogeneous multi-core processors by modifying the cache coherence protocol

NASA Astrophysics Data System (ADS)

Fang, Juan; Hao, Xiaoting; Fan, Qingwen; Chang, Zeqing; Song, Shuying

2017-05-01

In the Heterogeneous multi-core architecture, CPU and GPU processor are integrated on the same chip, which poses a new challenge to the last-level cache management. In this architecture, the CPU application and the GPU application execute concurrently, accessing the last-level cache. CPU and GPU have different memory access characteristics, so that they have differences in the sensitivity of last-level cache (LLC) capacity. For many CPU applications, a reduced share of the LLC could lead to significant performance degradation. On the contrary, GPU applications can tolerate increase in memory access latency when there is sufficient thread-level parallelism. Taking into account the GPU program memory latency tolerance characteristics, this paper presents a method that let GPU applications can access to memory directly, leaving lots of LLC space for CPU applications, in improving the performance of CPU applications and does not affect the performance of GPU applications. When the CPU application is cache sensitive, and the GPU application is insensitive to the cache, the overall performance of the system is improved significantly.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less

Parallel halftoning technique using dot diffusion optimization

NASA Astrophysics Data System (ADS)

Molina-Garcia, Javier; Ponomaryov, Volodymyr I.; Reyes-Reyes, Rogelio; Cruz-Ramos, Clara

2017-05-01

In this paper, a novel approach for halftone images is proposed and implemented for images that are obtained by the Dot Diffusion (DD) method. Designed technique is based on an optimization of the so-called class matrix used in DD algorithm and it consists of generation new versions of class matrix, which has no baron and near-baron in order to minimize inconsistencies during the distribution of the error. Proposed class matrix has different properties and each is designed for two different applications: applications where the inverse-halftoning is necessary, and applications where this method is not required. The proposed method has been implemented in GPU (NVIDIA GeForce GTX 750 Ti), multicore processors (AMD FX(tm)-6300 Six-Core Processor and in Intel core i5-4200U), using CUDA and OpenCV over a PC with linux. Experimental results have shown that novel framework generates a good quality of the halftone images and the inverse halftone images obtained. The simulation results using parallel architectures have demonstrated the efficiency of the novel technique when it is implemented in real-time processing.
The impact of Moore's Law and loss of Dennard scaling: Are DSP SoCs an energy efficient alternative to x86 SoCs?

NASA Astrophysics Data System (ADS)

Johnsson, L.; Netzer, G.

2016-10-01

Moore's law, the doubling of transistors per unit area for each CMOS technology generation, is expected to continue throughout the decade, while Dennard voltage scaling resulting in constant power per unit area stopped about a decade ago. The semiconductor industry's response to the loss of Dennard scaling and the consequent challenges in managing power distribution and dissipation has been leveled off clock rates, a die performance gain reduced from about a factor of 2.8 to 1.4 per technology generation, and multi-core processor dies with increased cache sizes. Increased caches sizes offers performance benefits for many applications as well as energy savings. Accessing data in cache is considerably more energy efficient than main memory accesses. Further, caches consume less power than a corresponding amount of functional logic. As feature sizes continue to be scaled down an increasing fraction of the die must be “underutilized” or “dark” due to power constraints. With power being a prime design constraint there is a concerted effort to find significantly more energy efficient chip architectures than dominant in servers today, with chips potentially incorporating several types of cores to cover a range of applications, or different functions in an application, as is already common for the mobile processor market. Digital Signal Processors (DSPs), largely targeting the embedded and mobile processor markets, typically have been designed for a power consumption of 10% or less of a typical x86 CPU, yet with much more than 10% of the floating-point capability of the same technology generation x86 CPUs. Thus, DSPs could potentially offer an energy efficient alternative to x86 CPUs. Here we report an assessment of the Texas Instruments TMS320C6678 DSP in regards to its energy efficiency for two common HPC benchmarks: STREAM (memory system benchmark) and HPL (CPU benchmark)
Mobile high-performance computing (HPC) for synthetic aperture radar signal processing

NASA Astrophysics Data System (ADS)

Misko, Joshua; Kim, Youngsoo; Qi, Chenchen; Sirkeci, Birsen

2018-04-01

The importance of mobile high-performance computing has emerged in numerous battlespace applications at the tactical edge in hostile environments. Energy efficient computing power is a key enabler for diverse areas ranging from real-time big data analytics and atmospheric science to network science. However, the design of tactical mobile data centers is dominated by power, thermal, and physical constraints. Presently, it is very unlikely to achieve required computing processing power by aggregating emerging heterogeneous many-core processing platforms consisting of CPU, Field Programmable Gate Arrays and Graphic Processor cores constrained by power and performance. To address these challenges, we performed a Synthetic Aperture Radar case study for Automatic Target Recognition (ATR) using Deep Neural Networks (DNNs). However, these DNN models are typically trained using GPUs with gigabytes of external memories and massively used 32-bit floating point operations. As a result, DNNs do not run efficiently on hardware appropriate for low power or mobile applications. To address this limitation, we proposed for compressing DNN models for ATR suited to deployment on resource constrained hardware. This proposed compression framework utilizes promising DNN compression techniques including pruning and weight quantization while also focusing on processor features common to modern low-power devices. Following this methodology as a guideline produced a DNN for ATR tuned to maximize classification throughput, minimize power consumption, and minimize memory footprint on a low-power device.
Next Processor Module: A Hardware Accelerator of UT699 LEON3-FT System for On-Board Computer Software Simulation

NASA Astrophysics Data System (ADS)

Langlois, Serge; Fouquet, Olivier; Gouy, Yann; Riant, David

2014-08-01

On-Board Computers (OBC) are more and more using integrated systems on-chip (SOC) that embed processors running from 50MHz up to several hundreds of MHz, and around which are plugged some dedicated communication controllers together with other Input/Output channels.For ground testing and On-Board SoftWare (OBSW) validation purpose, a representative simulation of these systems, faster than real-time and with cycle-true timing of execution, is not achieved with current purely software simulators.Since a few years some hybrid solutions where put in place ([1], [2]), including hardware in the loop so as to add accuracy and performance in the computer software simulation.This paper presents the results of the works engaged by Thales Alenia Space (TAS-F) at the end of 2010, that led to a validated HW simulator of the UT699 by mid- 2012 and that is now qualified and fully used in operational contexts.
SOFT ROBOTICS. A 3D-printed, functionally graded soft robot powered by combustion.

PubMed

Bartlett, Nicholas W; Tolley, Michael T; Overvelde, Johannes T B; Weaver, James C; Mosadegh, Bobak; Bertoldi, Katia; Whitesides, George M; Wood, Robert J

2015-07-10

Roboticists have begun to design biologically inspired robots with soft or partially soft bodies, which have the potential to be more robust and adaptable, and safer for human interaction, than traditional rigid robots. However, key challenges in the design and manufacture of soft robots include the complex fabrication processes and the interfacing of soft and rigid components. We used multimaterial three-dimensional (3D) printing to manufacture a combustion-powered robot whose body transitions from a rigid core to a soft exterior. This stiffness gradient, spanning three orders of magnitude in modulus, enables reliable interfacing between rigid driving components (controller, battery, etc.) and the primarily soft body, and also enhances performance. Powered by the combustion of butane and oxygen, this robot is able to perform untethered jumping. Copyright © 2015, American Association for the Advancement of Science.
Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

NASA Astrophysics Data System (ADS)

Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

2012-11-01

In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Performance of an MPI-only semiconductor device simulator on a quad socket/quad core InfiniBand platform.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shadid, John Nicolas; Lin, Paul Tinphone

2009-01-01

This preliminary study considers the scaling and performance of a finite element (FE) semiconductor device simulator on a capacity cluster with 272 compute nodes based on a homogeneous multicore node architecture utilizing 16 cores. The inter-node communication backbone for this Tri-Lab Linux Capacity Cluster (TLCC) machine is comprised of an InfiniBand interconnect. The nonuniform memory access (NUMA) nodes consist of 2.2 GHz quad socket/quad core AMD Opteron processors. The performance results for this study are obtained with a FE semiconductor device simulation code (Charon) that is based on a fully-coupled Newton-Krylov solver with domain decomposition and multilevel preconditioners. Scaling andmore » multicore performance results are presented for large-scale problems of 100+ million unknowns on up to 4096 cores. A parallel scaling comparison is also presented with the Cray XT3/4 Red Storm capability platform. The results indicate that an MPI-only programming model for utilizing the multicore nodes is reasonably efficient on all 16 cores per compute node. However, the results also indicated that the multilevel preconditioner, which is critical for large-scale capability type simulations, scales better on the Red Storm machine than the TLCC machine.« less
Atomistic Design of CdSe/CdS Core-Shell Quantum Dots with Suppressed Auger Recombination.

PubMed

Jain, Ankit; Voznyy, Oleksandr; Hoogland, Sjoerd; Korkusinski, Marek; Hawrylak, Pawel; Sargent, Edward H

2016-10-12

We design quasi-type-II CdSe/CdS core-shell colloidal quantum dots (CQDs) exhibiting a suppressed Auger recombination rate. We do so using fully atomistic tight-binding wave functions and microscopic Coulomb interactions. The recombination rate as a function of the core and shell size and shape is tested against experiments. Because of a higher density of deep hole states and stronger hole confinement, Auger recombination is found to be up to six times faster for positive trions compared to negative ones in 4 nm core/10 nm shell CQDs. Soft-confinement at the interface results in weak suppression of Auger recombination compared to same-bandgap sharp-interface CQDs. We find that the suppression is due to increased volume of the core resulting in delocalization of the wave functions, rather than due to soft-confinement itself. We show that our results are consistent with previous effective mass models with the same system parameters. Increasing the dot volume remains the most efficient way to suppress Auger recombination. We predict that a 4-fold suppression of Auger recombination can be achieved in 10 nm CQDs by increasing the core volume by using rodlike cores embedded in thick shells.
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

PubMed

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Real time emotion aware applications: a case study employing emotion evocative pictures and neuro-physiological sensing enhanced by Graphic Processor Units.

PubMed

Konstantinidis, Evdokimos I; Frantzidis, Christos A; Pappas, Costas; Bamidis, Panagiotis D

2012-07-01

In this paper the feasibility of adopting Graphic Processor Units towards real-time emotion aware computing is investigated for boosting the time consuming computations employed in such applications. The proposed methodology was employed in analysis of encephalographic and electrodermal data gathered when participants passively viewed emotional evocative stimuli. The GPU effectiveness when processing electroencephalographic and electrodermal recordings is demonstrated by comparing the execution time of chaos/complexity analysis through nonlinear dynamics (multi-channel correlation dimension/D2) and signal processing algorithms (computation of skin conductance level/SCL) into various popular programming environments. Apart from the beneficial role of parallel programming, the adoption of special design techniques regarding memory management may further enhance the time minimization which approximates a factor of 30 in comparison with ANSI C language (single-core sequential execution). Therefore, the use of GPU parallel capabilities offers a reliable and robust solution for real-time sensing the user's affective state. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Real-time autocorrelator for fluorescence correlation spectroscopy based on graphical-processor-unit architecture: method, implementation, and comparative studies

NASA Astrophysics Data System (ADS)

Laracuente, Nicholas; Grossman, Carl

2013-03-01

We developed an algorithm and software to calculate autocorrelation functions from real-time photon-counting data using the fast, parallel capabilities of graphical processor units (GPUs). Recent developments in hardware and software have allowed for general purpose computing with inexpensive GPU hardware. These devices are more suited for emulating hardware autocorrelators than traditional CPU-based software applications by emphasizing parallel throughput over sequential speed. Incoming data are binned in a standard multi-tau scheme with configurable points-per-bin size and are mapped into a GPU memory pattern to reduce time-expensive memory access. Applications include dynamic light scattering (DLS) and fluorescence correlation spectroscopy (FCS) experiments. We ran the software on a 64-core graphics pci card in a 3.2 GHz Intel i5 CPU based computer running Linux. FCS measurements were made on Alexa-546 and Texas Red dyes in a standard buffer (PBS). Software correlations were compared to hardware correlator measurements on the same signals. Supported by HHMI and Swarthmore College
Porting LAMMPS to GPUs.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, William Michael; Plimpton, Steven James; Wang, Peng

2010-03-01

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Imprints of superfluidity on magnetoelastic quasiperiodic oscillations of soft gamma-ray repeaters.

PubMed

Gabler, Michael; Cerdá-Durán, Pablo; Stergioulas, Nikolaos; Font, José A; Müller, Ewald

2013-11-22

Our numerical simulations show that axisymmetric, torsional, magnetoelastic oscillations of magnetars with a superfluid core can explain the whole range of observed quasiperiodic oscillations (QPOs) in the giant flares of soft gamma-ray repeaters. There exist constant phase QPOs at f is < or approximately equal to 150 Hz and resonantly excited high-frequency QPOs (f>500 Hz), in good agreement with observations. The range of magnetic field strengths required to match the observed QPO frequencies agrees with that from spin-down estimates. These results suggest that there is at least one superfluid species in magnetar cores.
Multifractal Analysis of Seismically Induced Soft-Sediment Deformation Structures Imaged by X-Ray Computed Tomography

NASA Astrophysics Data System (ADS)

Nakashima, Yoshito; Komatsubara, Junko

Unconsolidated soft sediments deform and mix complexly by seismically induced fluidization. Such geological soft-sediment deformation structures (SSDSs) recorded in boring cores were imaged by X-ray computed tomography (CT), which enables visualization of the inhomogeneous spatial distribution of iron-bearing mineral grains as strong X-ray absorbers in the deformed strata. Multifractal analysis was applied to the two-dimensional (2D) CT images with various degrees of deformation and mixing. The results show that the distribution of the iron-bearing mineral grains is multifractal for less deformed/mixed strata and almost monofractal for fully mixed (i.e. almost homogenized) strata. Computer simulations of deformation of real and synthetic digital images were performed using the egg-beater flow model. The simulations successfully reproduced the transformation from the multifractal spectra into almost monofractal spectra (i.e. almost convergence on a single point) with an increase in deformation/mixing intensity. The present study demonstrates that multifractal analysis coupled with X-ray CT and the mixing flow model is useful to quantify the complexity of seismically induced SSDSs, standing as a novel method for the evaluation of cores for seismic risk assessment.
High-performance multiprocessor architecture for a 3-D lattice gas model

NASA Technical Reports Server (NTRS)

Lee, F.; Flynn, M.; Morf, M.

1991-01-01

The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
A Tutorial on Parallel and Concurrent Programming in Haskell

NASA Astrophysics Data System (ADS)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Programming for 1.6 Millon cores: Early experiences with IBM's BG/Q SMP architecture

NASA Astrophysics Data System (ADS)

Glosli, James

2013-03-01

With the stall in clock cycle improvements a decade ago, the drive for computational performance has continues along a path of increasing core counts on a processor. The multi-core evolution has been expressed in both a symmetric multi processor (SMP) architecture and cpu/GPU architecture. Debates rage in the high performance computing (HPC) community which architecture best serves HPC. In this talk I will not attempt to resolve that debate but perhaps fuel it. I will discuss the experience of exploiting Sequoia, a 98304 node IBM Blue Gene/Q SMP at Lawrence Livermore National Laboratory. The advantages and challenges of leveraging the computational power BG/Q will be detailed through the discussion of two applications. The first application is a Molecular Dynamics code called ddcMD. This is a code developed over the last decade at LLNL and ported to BG/Q. The second application is a cardiac modeling code called Cardioid. This is a code that was recently designed and developed at LLNL to exploit the fine scale parallelism of BG/Q's SMP architecture. Through the lenses of these efforts I'll illustrate the need to rethink how we express and implement our computational approaches. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
CanOpen on RASTA: The Integration of the CanOpen IP Core in the Avionics Testbed

NASA Astrophysics Data System (ADS)

Furano, Gianluca; Guettache, Farid; Magistrati, Giorgio; Tiotto, Gabriele; Ortega, Carlos Urbina; Valverde, Alberto

2013-08-01

This paper presents the work done within the ESA Estec Data Systems Division, targeting the integration of the CanOpen IP Core with the existing Reference Architecture Test-bed for Avionics (RASTA). RASTA is the reference testbed system of the ESA Avionics Lab, designed to integrate the main elements of a typical Data Handling system. It aims at simulating a scenario where a Mission Control Center communicates with on-board computers and systems through a TM/TC link, thus providing the data management through qualified processors and interfaces such as Leon2 core processors, CAN bus controllers, MIL-STD-1553 and SpaceWire. This activity aims at the extension of the RASTA with two boards equipped with HurriCANe controller, acting as CANOpen slaves. CANOpen software modules have been ported on the RASTA system I/O boards equipped with Gaisler GR-CAN controller and acts as master communicating with the CCIPC boards. CanOpen serves as upper application layer for based on CAN defined within the CAN-in-Automation standard and can be regarded as the definitive standard for the implementation of CAN-based systems solutions. The development and integration of CCIPC performed by SITAEL S.p.A., is the first application that aims to bring the CANOpen standard for space applications. The definition of CANOpen within the European Cooperation for Space Standardization (ECSS) is under development.
Static and Dynamic Frequency Scaling on Multicore CPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bao, Wenlei; Hong, Changwan; Chunduri, Sudheer

2016-12-28

Dynamic voltage and frequency scaling (DVFS) adapts CPU power consumption by modifying a processor’s operating frequency (and the associated voltage). Typical approaches employing DVFS involve default strategies such as running at the lowest or the highest frequency, or observing the CPU’s runtime behavior and dynamically adapting the voltage/frequency configuration based on CPU usage. In this paper, we argue that many previous approaches suffer from inherent limitations, such as not account- ing for processor-specific impact of frequency changes on energy for different workload types. We first propose a lightweight runtime-based approach to automatically adapt the frequency based on the CPU workload,more » that is agnostic of the processor characteristics. We then show that further improvements can be achieved for affine kernels in the application, using a compile-time characterization instead of run-time monitoring to select the frequency and number of CPU cores to use. Our framework relies on a one-time energy characterization of CPU-specific DVFS profiles followed by a compile-time categorization of loop-based code segments in the application. These are combined to determine a priori of the frequency and the number of cores to use to execute the application so as to optimize energy or energy-delay product, outperforming runtime approach. Extensive evaluation on 60 benchmarks and five multi-core CPUs show that our approach systematically outperforms the powersave Linux governor, while improving overall performance.« less
Templated and template-free fabrication strategies for zero-dimensional hollow MOF superstructures.

PubMed

Kim, Hyehyun; Lah, Myoung Soo

2017-05-16

Various fabrication strategies for hollow metal-organic framework (MOF) superstructures are reviewed and classified using various types of external templates and their properties. Hollow MOF superstructures have also been prepared without external templates, wherein unstable intermediates obtained during reactions convert to the final hollow MOF superstructures. Many hollow MOF superstructures have been fabricated using hard templates. After the core-shell core@MOF structure was prepared using a hard template, the core was selectively etched to generate a hollow MOF superstructure. Another approach for generating hollow superstructures is to use a solid reactant as a sacrificial template; this method requires no additional etching process. Soft templates such as discontinuous liquid/emulsion droplets and gas bubbles in a continuous soft phase have also been employed to prepare hollow MOF superstructures.

Polymer optical waveguide with multiple graded-index cores for on-board interconnects fabricated using soft-lithography.

PubMed

Ishigure, Takaaki; Nitta, Yosuke

2010-06-21

We successfully fabricate a polymer optical waveguide with multiple graded-index (GI) cores directly on a substrate utilizing the soft-lithography method. A UV-curable polymer (TPIR-202) supplied from Tokyo Ohka Kogyo Co., Ltd. is used, and the GI cores are formed during the curing process of the core region, which is similar to the preform process we previously reported. We experimentally confirm that near parabolic refractive index profiles are formed in the parallel cores (more than 50 channels) with 40 microm x 40 microm size at 250-microm pitch. Although the loss is still as high as 0.1 approximately 0.3 dB/cm at 850 nm, which is mainly due to scattering loss inherent to the polymer matrix, the scattering loss attributed to the waveguide's structural irregularity could be sufficiently reduced by a graded refractive index profile. For comparison, we fabricate step-index (SI)-core waveguides with the same materials by means of the same process. Then, we evaluate the inter-channel crosstalk in the SI- and GI-core waveguides under almost the same conditions. It is noteworthy that remarkable crosstalk reduction (5 dB and beyond) is confirmed in the GI-core waveguides, since the propagating modes in GI-cores are tightly confined near the core center and less optical power is found near the core cladding boundary. This significant improvement in the inter-channel crosstalk allows the GI-core waveguides to be utilized for extra high-density on-board optical interconnections.
75 FR 5363 - Self-Regulatory Organizations; NYSE Arca, Inc.; Order Approving Proposed Rule Change Modifying...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-02-02

... Reference Prices constitute ``non-core data;'' i.e., the Exchange does not require a central processor to... Realtime Reference Prices Service January 22, 2010. I. Introduction On December 1, 2009, NYSE Arca, Inc... thereunder,\\2\\ a proposed rule change to add data elements to its ``NYSE Arca Realtime Reference Prices...
PS3 CELL Development for Scientific Computation and Research

NASA Astrophysics Data System (ADS)

Christiansen, M.; Sevre, E.; Wang, S. M.; Yuen, D. A.; Liu, S.; Lyness, M. D.; Broten, M.

2007-12-01

The Cell processor is one of the most powerful processors on the market, and researchers in the earth sciences may find its parallel architecture to be very useful. A cell processor, with 7 cores, can easily be obtained for experimentation by purchasing a PlayStation 3 (PS3) and installing linux and the IBM SDK. Each core of the PS3 is capable of 25 GFLOPS giving a potential limit of 150 GFLOPS when using all 6 SPUs (synergistic processing units) by using vectorized algorithms. We have used the Cell's computational power to create a program which takes simulated tsunami datasets, parses them, and returns a colorized height field image using ray casting techniques. As expected, the time required to create an image is inversely proportional to the number of SPUs used. We believe that this trend will continue when multiple PS3s are chained using OpenMP functionality and are in the process of researching this. By using the Cell to visualize tsunami data, we have found that its greatest feature is its power. This fact entwines well with the needs of the scientific community where the limiting factor is time. Any algorithm, such as the heat equation, that can be subdivided into multiple parts can take advantage of the PS3 Cell's ability to split the computations across the 6 SPUs reducing required run time by one sixth. Further vectorization of the code can allow for 4 simultanious floating point operations by using the SIMD (single instruction multiple data) capabilities of the SPU increasing efficiency 24 times.
High-Speed On-Board Data Processing Platform for LIDAR Projects at NASA Langley Research Center

NASA Astrophysics Data System (ADS)

Beyon, J.; Ng, T. K.; Davis, M. J.; Adams, J. K.; Lin, B.

2015-12-01

The project called High-Speed On-Board Data Processing for Science Instruments (HOPS) has been funded by NASA Earth Science Technology Office (ESTO) Advanced Information Systems Technology (AIST) program during April, 2012 - April, 2015. HOPS is an enabler for science missions with extremely high data processing rates. In this three-year effort of HOPS, Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS) and 3-D Winds were of interest in particular. As for ASCENDS, HOPS replaces time domain data processing with frequency domain processing while making the real-time on-board data processing possible. As for 3-D Winds, HOPS offers real-time high-resolution wind profiling with 4,096-point fast Fourier transform (FFT). HOPS is adaptable with quick turn-around time. Since HOPS offers reusable user-friendly computational elements, its FPGA IP Core can be modified for a shorter development period if the algorithm changes. The FPGA and memory bandwidth of HOPS is 20 GB/sec while the typical maximum processor-to-SDRAM bandwidth of the commercial radiation tolerant high-end processors is about 130-150 MB/sec. The inter-board communication bandwidth of HOPS is 4 GB/sec while the effective processor-to-cPCI bandwidth of commercial radiation tolerant high-end boards is about 50-75 MB/sec. Also, HOPS offers VHDL cores for the easy and efficient implementation of ASCENDS and 3-D Winds, and other similar algorithms. A general overview of the 3-year development of HOPS is the goal of this presentation.
High-Speed On-Board Data Processing for Science Instruments: HOPS

NASA Technical Reports Server (NTRS)

Beyon, Jeffrey

2015-01-01

The project called High-Speed On-Board Data Processing for Science Instruments (HOPS) has been funded by NASA Earth Science Technology Office (ESTO) Advanced Information Systems Technology (AIST) program during April, 2012 â€" April, 2015. HOPS is an enabler for science missions with extremely high data processing rates. In this three-year effort of HOPS, Active Sensing of CO2 Emissions over Nights, Days, and Seasons (ASCENDS) and 3-D Winds were of interest in particular. As for ASCENDS, HOPS replaces time domain data processing with frequency domain processing while making the real-time on-board data processing possible. As for 3-D Winds, HOPS offers real-time high-resolution wind profiling with 4,096-point fast Fourier transform (FFT). HOPS is adaptable with quick turn-around time. Since HOPS offers reusable user-friendly computational elements, its FPGA IP Core can be modified for a shorter development period if the algorithm changes. The FPGA and memory bandwidth of HOPS is 20 GB/sec while the typical maximum processor-to-SDRAM bandwidth of the commercial radiation tolerant high-end processors is about 130-150 MB/sec. The inter-board communication bandwidth of HOPS is 4 GB/sec while the effective processor-to-cPCI bandwidth of commercial radiation tolerant high-end boards is about 50-75 MB/sec. Also, HOPS offers VHDL cores for the easy and efficient implementation of ASCENDS and 3-D Winds, and other similar algorithms. A general overview of the 3-year development of HOPS is the goal of this presentation.
Use of Field Programmable Gate Array Technology in Future Space Avionics

NASA Technical Reports Server (NTRS)

Ferguson, Roscoe C.; Tate, Robert

2005-01-01

Fulfilling NASA's new vision for space exploration requires the development of sustainable, flexible and fault tolerant spacecraft control systems. The traditional development paradigm consists of the purchase or fabrication of hardware boards with fixed processor and/or Digital Signal Processing (DSP) components interconnected via a standardized bus system. This is followed by the purchase and/or development of software. This paradigm has several disadvantages for the development of systems to support NASA's new vision. Building a system to be fault tolerant increases the complexity and decreases the performance of included software. Standard bus design and conventional implementation produces natural bottlenecks. Configuring hardware components in systems containing common processors and DSPs is difficult initially and expensive or impossible to change later. The existence of Hardware Description Languages (HDLs), the recent increase in performance, density and radiation tolerance of Field Programmable Gate Arrays (FPGAs), and Intellectual Property (IP) Cores provides the technology for reprogrammable Systems on a Chip (SOC). This technology supports a paradigm better suited for NASA's vision. Hardware and software production are melded for more effective development; they can both evolve together over time. Designers incorporating this technology into future avionics can benefit from its flexibility. Systems can be designed with improved fault isolation and tolerance using hardware instead of software. Also, these designs can be protected from obsolescence problems where maintenance is compromised via component and vendor availability.To investigate the flexibility of this technology, the core of the Central Processing Unit and Input/Output Processor of the Space Shuttle AP101S Computer were prototyped in Verilog HDL and synthesized into an Altera Stratix FPGA.
Towards energy-efficient photonic interconnects

NASA Astrophysics Data System (ADS)

Demir, Yigit; Hardavellas, Nikos

2015-03-01

Silicon photonics have emerged as a promising solution to meet the growing demand for high-bandwidth, low-latency, and energy-efficient on-chip and off-chip communication in many-core processors. However, current silicon-photonic interconnect designs for many-core processors waste a significant amount of power because (a) lasers are always on, even during periods of interconnect inactivity, and (b) microring resonators employ heaters which consume a significant amount of power just to overcome thermal variations and maintain communication on the photonic links, especially in a 3D-stacked design. The problem of high laser power consumption is particularly important as lasers typically have very low energy efficiency, and photonic interconnects often remain underutilized both in scientific computing (compute-intensive execution phases underutilize the interconnect), and in server computing (servers in Google-scale datacenters have a typical utilization of less than 30%). We address the high laser power consumption by proposing EcoLaser+, which is a laser control scheme that saves energy by predicting the interconnect activity and opportunistically turning the on-chip laser off when possible, and also by scaling the width of the communication link based on a runtime prediction of the expected message length. Our laser control scheme can save up to 62 - 92% of the laser energy, and improve the energy efficiency of a manycore processor with negligible performance penalty. We address the high trimming (heating) power consumption of the microrings by proposing insulation methods that reduce the impact of localized heating induced by highly-active components on the 3D-stacked logic die.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Generating unstructured nuclear reactor core meshes in parallel

DOE PAGES

Jain, Rajeev; Tautges, Timothy J.

2014-10-24

Recent advances in supercomputers and parallel solver techniques have enabled users to run large simulations problems using millions of processors. Techniques for multiphysics nuclear reactor core simulations are under active development in several countries. Most of these techniques require large unstructured meshes that can be hard to generate in a standalone desktop computers because of high memory requirements, limited processing power, and other complexities. We have previously reported on a hierarchical lattice-based approach for generating reactor core meshes. Here, we describe efforts to exploit coarse-grained parallelism during reactor assembly and reactor core mesh generation processes. We highlight several reactor coremore » examples including a very high temperature reactor, a full-core model of the Korean MONJU reactor, a ¼ pressurized water reactor core, the fast reactor Experimental Breeder Reactor-II core with a XX09 assembly, and an advanced breeder test reactor core. The times required to generate large mesh models, along with speedups obtained from running these problems in parallel, are reported. A graphical user interface to the tools described here has also been developed.« less
Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures

NASA Astrophysics Data System (ADS)

Romano, Paul Kollath

Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
DNA compaction by poly (amido amine) dendrimers of ammonia cored and ethylene diamine cored

NASA Astrophysics Data System (ADS)

Qamhieh, K.; Al-Shawwa, J.

2017-06-01

The complexes build-up of DNA and soft particles poly amidoamine (PAMAM) dendrimers of ammonia cored of generations (G1-G6) and ethylenediamine cored of generations (G1-G10) have been studied, using a new theoretical model developed by Qamhieh and coworkers. The model describes the interaction between linear polyelectrolyte (LPE) chain and ion-penetrable spheres. Many factors affecting LPE/dendrimer complex have been investigated such as dendrimer generation, the Bjerrum length, salt concentration, and rigidity of the LPE chain represented by the persistence length. It is found that the wrapping chain length around dendrimer increases by increasing dendrimer`s generation, Bjerrum length, and salt concentration, while decreases by increasing the persistence length of the LPE chain. Also we can conclude that the wrapping length of LPE chain around ethylenediamine cored dendrimers is larger than its length around ammonia cored dendrimers.
Observations of the structure and evolution of solar flares with a soft X-ray telescope

NASA Technical Reports Server (NTRS)

Vorpahl, J. A.; Gibson, E. G.; Landecker, P. B.; Mckenzie, D. L.; Underwood, J. M.

1975-01-01

Soft X ray flare events were observed with the S-056 X-ray telescope that was part of the ATM complement of instruments aboard SKYLAB. Analyses of these data are reported. The observations are summarized and a detailed discussion of the X-ray flare structures is presented. The data indicated that soft X-ray emitted by a flare come primarily from an intense well-defined core surrounded by a region of fainter, more diffuse emission. An analysis of flare evolution indicates evidence for preliminary heating and energy release prior to the main phase of the flare. Core features are found to be remarkably stable and retain their shape throughout a flare. Most changes in the overall configuration seem to be result of the appearance, disappearance or change in brightness of individual features, rather than the restructuring or reorientation of these features. Brief comparisons with several theories are presented.
Low-voltage analog front-end processor design for ISFET-based sensor and H+ sensing applications

NASA Astrophysics Data System (ADS)

Chung, Wen-Yaw; Yang, Chung-Huang; Peng, Kang-Chu; Yeh, M. H.

2003-04-01

This paper presents a modular-based low-voltage analog-front-end processor design in a 0.5mm double-poly double-metal CMOS technology for Ion Sensitive Field Effect Transistor (ISFET)-based sensor and H+ sensing applications. To meet the potentiometric response of the ISFET that is proportional to various H+ concentrations, the constant-voltage and constant current (CVCS) testing configuration has been used. Low-voltage design skills such as bulk-driven input pair, folded-cascode amplifier, bootstrap switch control circuits have been designed and integrated for 1.5V supply and nearly rail-to-rail analog to digital signal processing. Core modules consist of an 8-bit two-step analog-digital converter and bulk-driven pre-amplifiers have been developed in this research. The experimental results show that the proposed circuitry has an acceptable linearity to 0.1 pH-H+ sensing conversions with the buffer solution in the range of pH2 to pH12. The processor has a potential usage in battery-operated and portable healthcare devices and environmental monitoring applications.
Fault-Tolerant Software-Defined Radio on Manycore

NASA Technical Reports Server (NTRS)

Ricketts, Scott

2015-01-01

Software-defined radio (SDR) platforms generally rely on field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), but such architectures require significant software development. In addition, application demands for radiation mitigation and fault tolerance exacerbate programming challenges. MaXentric Technologies, LLC, has developed a manycore-based SDR technology that provides 100 times the throughput of conventional radiationhardened general purpose processors. Manycore systems (30-100 cores and beyond) have the potential to provide high processing performance at error rates that are equivalent to current space-deployed uniprocessor systems. MaXentric's innovation is a highly flexible radio, providing over-the-air reconfiguration; adaptability; and uninterrupted, real-time, multimode operation. The technology is also compliant with NASA's Space Telecommunications Radio System (STRS) architecture. In addition to its many uses within NASA communications, the SDR can also serve as a highly programmable research-stage prototyping device for new waveforms and other communications technologies. It can also support noncommunication codes on its multicore processor, collocated with the communications workload-reducing the size, weight, and power of the overall system by aggregating processing jobs to a single board computer.
Single-event upset in highly scaled commercial silicon-on-insulator PowerPc microprocessors

NASA Technical Reports Server (NTRS)

Irom, Farokh; Farmanesh, Farhad H.

2004-01-01

Single event upset effects from heavy ions are measured for Motorola and IBM silicon-on-insulator (SOI) microprocessors with different feature sizes, and core voltages. The results are compared with results for similar devices with build substrates. The cross sections of the SOI processors are lower than their bulk counterparts, but the threshold is about the same, even though the charge collections depth is more than an order of magnitude smaller in the SOI devices. The scaling of the cross section with reduction of feature size and core voltage dependence for SOI microprocessors discussed.
CQPSO scheduling algorithm for heterogeneous multi-core DAG task model

NASA Astrophysics Data System (ADS)

Zhai, Wenzheng; Hu, Yue-Li; Ran, Feng

2017-07-01

Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
Electrophoresis of a charged soft particle in a charged cavity with arbitrary double-layer thickness.

PubMed

Chen, Wei J; Keh, Huan J

2013-08-22

An analysis for the quasi-steady electrophoretic motion of a soft particle composed of a charged spherical rigid core and an adsorbed porous layer positioned at the center of a charged spherical cavity filled with an arbitrary electrolyte solution is presented. Within the porous layer, frictional segments with fixed charges are assumed to distribute uniformly. Through the use of the linearized Poisson-Boltzmann equation and the Laplace equation, the equilibrium double-layer potential distribution and its perturbation caused by the applied electric field are separately determined. The modified Stokes and Brinkman equations governing the fluid flow fields outside and inside the porous layer, respectively, are solved subsequently. An explicit formula for the electrokinetic migration velocity of the soft particle in terms of the fixed charge densities on the rigid core surface, in the porous layer, and on the cavity wall is obtained from a balance between its electrostatic and hydrodynamic forces. This formula is valid for arbitrary values of κa, λa, r0/a, and a/b, where κ is the Debye screening parameter, λ is the reciprocal of the length characterizing the extent of flow penetration inside the porous layer, a is the radius of the soft particle, r0 is the radius of the rigid core of the particle, and b is the radius of the cavity. In the limiting cases of r0 = a and r0 = 0, the migration velocity for the charged soft sphere reduces to that for a charged impermeable sphere and that for a charged porous sphere, respectively, in the charged cavity. The effect of the surface charge at the cavity wall on the particle migration can be significant, and the particle may reverse the direction of its migration.
Mechanically strong nanocrystalline Fe-Si-B-P-Cu soft magnetic powder cores utilizing magnetic metallic glass as a binder

NASA Astrophysics Data System (ADS)

Luan, Jian; Sharma, Parmanand; Yodoshi, Noriharu; Zhang, Yan; Makino, Akihiro

2016-05-01

We report on the fabrication and properties of soft magnetic powder cores with superior mechanical strength as well as low core loss (W). Development of such cores is important for applications in automobiles/devices operating in motion. High saturation magnetic flux density (Bs) Fe-Si-B-P-Cu powder was sintered with Fe55C10B5P10Ni15Mo5 metallic glass (MG) powder in its supercooled liquid state by spark plasma sintering. The sintered cores are made from the nanocrystalline powder particles of Fe-Si-B-P-Cu alloy, which are separated through a magnetic Fe55C10B5P10Ni15Mo5 MG alloy. Low W of ˜ 2.2 W/kg (at 1T and 50 Hz), and high fracture strength (yielding stress ˜500 MPa), which is an order of magnitude higher than the conventional powder cores, were obtained. Stronger metal-metal bonding and magnetic nature of MG binder (which is very different than the conventional polymer based binders) are responsible for the superior mechanical and magnetic properties. The MG binder not only helps in improving the mechanical properties but it also enhances the overall Bs of the core.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores.

PubMed

Chikkagoudar, Satish; Wang, Kai; Li, Mingyao

2011-05-26

Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.

GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

PubMed Central

2011-01-01

Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923
Use of Soft Computing Technologies For Rocket Engine Control

NASA Technical Reports Server (NTRS)

Trevino, Luis C.; Olcmen, Semih; Polites, Michael

2003-01-01

The problem to be addressed in this paper is to explore how the use of Soft Computing Technologies (SCT) could be employed to further improve overall engine system reliability and performance. Specifically, this will be presented by enhancing rocket engine control and engine health management (EHM) using SCT coupled with conventional control technologies, and sound software engineering practices used in Marshall s Flight Software Group. The principle goals are to improve software management, software development time and maintenance, processor execution, fault tolerance and mitigation, and nonlinear control in power level transitions. The intent is not to discuss any shortcomings of existing engine control and EHM methodologies, but to provide alternative design choices for control, EHM, implementation, performance, and sustaining engineering. The approaches outlined in this paper will require knowledge in the fields of rocket engine propulsion, software engineering for embedded systems, and soft computing technologies (i.e., neural networks, fuzzy logic, and Bayesian belief networks), much of which is presented in this paper. The first targeted demonstration rocket engine platform is the MC-1 (formerly FASTRAC Engine) which is simulated with hardware and software in the Marshall Avionics & Software Testbed laboratory that
A mean spherical model for soft potentials: The hard core revealed as a perturbation

NASA Technical Reports Server (NTRS)

Rosenfeld, Y.; Ashcroft, N. W.

1978-01-01

The mean spherical approximation for fluids is extended to treat the case of dense systems interacting via soft-potentials. The extension takes the form of a generalized statement concerning the behavior of the direct correlation function c(r) and radial distribution g(r). From a detailed analysis that views the hard core portion of a potential as a perturbation on the whole, a specific model is proposed which possesses analytic solutions for both Coulomb and Yukawa potentials, in addition to certain other remarkable properties. A variational principle for the model leads to a relatively simple method for obtaining numerical solutions.
Pure-iron/iron-based-alloy hybrid soft magnetic powder cores compacted at ultra-high pressure

NASA Astrophysics Data System (ADS)

Saito, Tatsuya; Tsuruta, Hijiri; Watanabe, Asako; Ishimine, Tomoyuki; Ueno, Tomoyuki

2018-04-01

We developed Fe/FeSiAl soft magnetic powder cores (SMCs) for realizing the miniaturization and high efficiency of an electromagnetic conversion coil in the high-frequency range (˜20 kHz). We found that Fe/FeSiAl SMCs can be formed with a higher density under higher compaction pressure than pure-iron SMCs. These SMCs delivered a saturation magnetic flux density of 1.7 T and iron loss (W1/20k) of 158 kW/m3. The proposed SMCs exhibited similar excellent characteristics even in block shapes, which are closer to the product shapes.
A 60 GOPS/W, -1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology

NASA Astrophysics Data System (ADS)

Rossi, Davide; Pullini, Antonio; Loi, Igor; Gautschi, Michael; Gürkaynak, Frank K.; Bartolini, Andrea; Flatresse, Philippe; Benini, Luca

2016-03-01

Ultra-low power operation and extreme energy efficiency are strong requirements for a number of high-growth application areas, such as E-health, Internet of Things, and wearable Human-Computer Interfaces. A promising approach to achieve up to one order of magnitude of improvement in energy efficiency over current generation of integrated circuits is near-threshold computing. However, frequency degradation due to aggressive voltage scaling may not be acceptable across all performance-constrained applications. Thread-level parallelism over multiple cores can be used to overcome the performance degradation at low voltage. Moreover, enabling the processors to operate on-demand and over a wide supply voltage and body bias ranges allows to achieve the best possible energy efficiency while satisfying a large spectrum of computational demands. In this work we present the first ever implementation of a 4-core cluster fabricated using conventional-well 28 nm UTBB FD-SOI technology. The multi-core architecture we present in this work is able to operate on a wide range of supply voltages starting from 0.44 V to 1.2 V. In addition, the architecture allows a wide range of body bias to be applied from -1.8 V to 0.9 V. The peak energy efficiency 60 GOPS/W is achieved at 0.5 V supply voltage and 0.5 V forward body bias. Thanks to the extended body bias range of conventional-well FD-SOI technology, high energy efficiency can be guaranteed for a wide range of process and environmental conditions. We demonstrate the ability to compensate for up to 99.7% of chips for process variation with only ±0.2 V of body biasing, and compensate temperature variation in the range -40 °C to 120 °C exploiting -1.1 V to 0.8 V body biasing. When compared to leading-edge near-threshold RISC processors optimized for extremely low power applications, the multi-core architecture we propose has 144× more performance at comparable energy efficiency levels. Even when compared to other low-power processors with comparable performance, including those implemented in 28 nm technology, our platform provides 1.4× to 3.7× better energy efficiency.
Multicolor, time-gated, soft x-ray pinhole imaging of wire array and gas puff Z pinches on the Z and Saturn pulsed power generators.

PubMed

Jones, B; Coverdale, C A; Nielsen, D S; Jones, M C; Deeney, C; Serrano, J D; Nielsen-Weber, L B; Meyer, C J; Apruzese, J P; Clark, R W; Coleman, P L

2008-10-01

A multicolor, time-gated, soft x-ray pinhole imaging instrument is fielded as part of the core diagnostic set on the 25 MA Z machine [M. E. Savage et al., in Proceedings of the Pulsed Power Plasma Sciences Conference (IEEE, New York, 2007), p. 979] for studying intense wire array and gas puff Z-pinch soft x-ray sources. Pinhole images are reflected from a planar multilayer mirror, passing 277 eV photons with <10 eV bandwidth. An adjacent pinhole camera uses filtration alone to view 1-10 keV photons simultaneously. Overlaying these data provides composite images that contain both spectral as well as spatial information, allowing for the study of radiation production in dense Z-pinch plasmas. Cu wire arrays at 20 MA on Z show the implosion of a colder cloud of material onto a hot dense core where K-shell photons are excited. A 528 eV imaging configuration has been developed on the 8 MA Saturn generator [R. B. Spielman et al., and A. I. P. Conf, Proc. 195, 3 (1989)] for imaging a bright Li-like Ar L-shell line. Ar gas puff Z pinches show an intense K-shell emission from a zippering stagnation front with L-shell emission dominating as the plasma cools.
Results of SEI Independent Research and Development Projects

DTIC Science & Technology

2008-12-01

contained there. When laptops with a dual-core processor came out, ITunes fails crashed. ITunes was designed as multi-threaded application, but until...involving product portfolio, in-bound technical marketing, research and development, product engineering, supply chain, and out-bound sales and marketing...of quality and process improvement professionals to the marketing, product engineering, supply chain, product test and sales professionals. 3
Photonic-Networks-on-Chip for High Performance Radiation Survivable Multi-Core Processor Systems

DTIC Science & Technology

2013-12-01

Loss Spectra” Proceedings of SPIE 8255, (2012) and in a journal publication: M. T. Crowley, D. Murrell, N. Patel, M. Breivik , C.-Y. Lin, Y. Li, B.-O...Crowley, D. Murrell, N. Patel, M. Breivik , C.-Y. Lin, Y. Li, B.-O. Fimland and L. F. Lester, "Analytical Modeling of the Temperature Performance of
Theoretical and Experimental Studies of Epidermal Heat Flux Sensors for Measurements of Core Body Temperature

PubMed Central

Zhang, Yihui; Webb, Richard Chad; Luo, Hongying; Xue, Yeguang; Kurniawan, Jonas; Cho, Nam Heon; Krishnan, Siddharth; Li, Yuhang; Huang, Yonggang

2016-01-01

Long-term, continuous measurement of core body temperature is of high interest, due to the widespread use of this parameter as a key biomedical signal for clinical judgment and patient management. Traditional approaches rely on devices or instruments in rigid and planar forms, not readily amenable to intimate or conformable integration with soft, curvilinear, time-dynamic, surfaces of the skin. Here, materials and mechanics designs for differential temperature sensors are presented which can attach softly and reversibly onto the skin surface, and also sustain high levels of deformation (e.g., bending, twisting, and stretching). A theoretical approach, together with a modeling algorithm, yields core body temperature from multiple differential measurements from temperature sensors separated by different effective distances from the skin. The sensitivity, accuracy, and response time are analyzed by finite element analyses (FEA) to provide guidelines for relationships between sensor design and performance. Four sets of experiments on multiple devices with different dimensions and under different convection conditions illustrate the key features of the technology and the analysis approach. Finally, results indicate that thermally insulating materials with cellular structures offer advantages in reducing the response time and increasing the accuracy, while improving the mechanics and breathability. PMID:25953120
Energy challenges in optical access and aggregation networks.

PubMed

Kilper, Daniel C; Rastegarfar, Houman

2016-03-06

Scalability is a critical issue for access and aggregation networks as they must support the growth in both the size of data capacity demands and the multiplicity of access points. The number of connected devices, the Internet of Things, is growing to the tens of billions. Prevailing communication paradigms are reaching physical limitations that make continued growth problematic. Challenges are emerging in electronic and optical systems and energy increasingly plays a central role. With the spectral efficiency of optical systems approaching the Shannon limit, increasing parallelism is required to support higher capacities. For electronic systems, as the density and speed increases, the total system energy, thermal density and energy per bit are moving into regimes that become impractical to support-for example requiring single-chip processor powers above the 100 W limit common today. We examine communication network scaling and energy use from the Internet core down to the computer processor core and consider implications for optical networks. Optical switching in data centres is identified as a potential model from which scalable access and aggregation networks for the future Internet, with the application of integrated photonic devices and intelligent hybrid networking, will emerge. © 2016 The Author(s).
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

NASA Astrophysics Data System (ADS)

Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

2016-09-01

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Electrostatic interactions between diffuse soft multi-layered (bio)particles: beyond Debye-Hückel approximation and Deryagin formulation.

PubMed

Duval, Jérôme F L; Merlin, Jenny; Narayana, Puranam A L

2011-01-21

We report a steady-state theory for the evaluation of electrostatic interactions between identical or dissimilar spherical soft multi-layered (bio)particles, e.g. microgels or microorganisms. These generally consist of a rigid core surrounded by concentric ion-permeable layers that may differ in thickness, soft material density, chemical composition and degree of dissociation for the ionogenic groups. The formalism allows the account of diffuse interphases where distributions of ionogenic groups from one layer to the other are position-dependent. The model is valid for any number of ion-permeable layers around the core of the interacting soft particles and covers all limiting situations in terms of nature of interacting particles, i.e. homo- and hetero-interactions between hard, soft or entirely porous colloids. The theory is based on a rigorous numerical solution of the non-linearized Poisson-Boltzmann equation including radial and angular distortions of the electric field distribution within and outside the interacting soft particles in approach. The Gibbs energy of electrostatic interaction is obtained from a general expression derived following the method by Verwey and Overbeek based on appropriate electric double layer charging mechanisms. Original analytical solutions are provided here for cases where interaction takes place between soft multi-layered particles whose size and charge density are in line with Deryagin treatment and Debye-Hückel approximation. These situations include interactions between hard and soft particles, hard plate and soft particle or soft plate and soft particle. The flexibility of the formalism is highlighted by the discussion of few situations which clearly illustrate that electrostatic interaction between multi-layered particles may be partly or predominantly governed by potential distribution within the most internal layers. A major consequence is that both amplitude and sign of Gibbs electrostatic interaction energy may dramatically change depending on the interplay between characteristic Debye length, thickness of ion-permeable layers and their respective protolytic features (e.g. location, magnitude and sign of charge density). This formalism extends a recent model by Ohshima which is strictly limited to interaction between soft mono-shell particles within Deryagin and Debye-Hückel approximations under conditions where ionizable sites are completely dissociated.
HeinzelCluster: accelerated reconstruction for FORE and OSEM3D.

PubMed

Vollmar, S; Michel, C; Treffert, J T; Newport, D F; Casey, M; Knöss, C; Wienhard, K; Liu, X; Defrise, M; Heiss, W D

2002-08-07

Using iterative three-dimensional (3D) reconstruction techniques for reconstruction of positron emission tomography (PET) is not feasible on most single-processor machines due to the excessive computing time needed, especially so for the large sinogram sizes of our high-resolution research tomograph (HRRT). In our first approach to speed up reconstruction time we transform the 3D scan into the format of a two-dimensional (2D) scan with sinograms that can be reconstructed independently using Fourier rebinning (FORE) and a fast 2D reconstruction method. On our dedicated reconstruction cluster (seven four-processor systems, Intel PIII@700 MHz, switched fast ethernet and Myrinet, Windows NT Server), we process these 2D sinograms in parallel. We have achieved a speedup > 23 using 26 processors and also compared results for different communication methods (RPC, Syngo, Myrinet GM). The other approach is to parallelize OSEM3D (implementation of C Michel), which has produced the best results for HRRT data so far and is more suitable for an adequate treatment of the sinogram gaps that result from the detector geometry of the HRRT. We have implemented two levels of parallelization for four dedicated cluster (a shared memory fine-grain level on each node utilizing all four processors and a coarse-grain level allowing for 15 nodes) reducing the time for one core iteration from over 7 h to about 35 min.
The design of infrared information collection circuit based on embedded technology

NASA Astrophysics Data System (ADS)

Liu, Haoting; Zhang, Yicong

2013-07-01

S3C2410 processor is a 16/32 bit RISC embedded processor which based on ARM920T core and AMNA bus, and mainly for handheld devices, and high cost, low-power applications. This design introduces a design plan of the PIR sensor system, circuit and its assembling, debugging. The Application Circuit of the passive PIR alarm uses the invisibility of the infrared radiation well into the alarm system, and in order to achieve the anti-theft alarm and security purposes. When the body goes into the range of PIR sensor detection, sensors will detect heat sources and then the sensor will output a weak signal. The Signal should be amplified, compared and delayed; finally light emitting diodes emit light, playing the role of a police alarm.
Static analysis of the hull plate using the finite element method

NASA Astrophysics Data System (ADS)

Ion, A.

2015-11-01

This paper aims at presenting the static analysis for two levels of a container ship's construction as follows: the first level is at the girder / hull plate and the second level is conducted at the entire strength hull of the vessel. This article will describe the work for the static analysis of a hull plate. We shall use the software package ANSYS Mechanical 14.5. The program is run on a computer with four Intel Xeon X5260 CPU processors at 3.33 GHz, 32 GB memory installed. In terms of software, the shared memory parallel version of ANSYS refers to running ANSYS across multiple cores on a SMP system. The distributed memory parallel version of ANSYS (Distributed ANSYS) refers to running ANSYS across multiple processors on SMP systems or DMP systems.
Real Time Phase Noise Meter Based on a Digital Signal Processor

NASA Technical Reports Server (NTRS)

Angrisani, Leopoldo; D'Arco, Mauro; Greenhall, Charles A.; Schiano Lo Morille, Rosario

2006-01-01

A digital signal-processing meter for phase noise measurement on sinusoidal signals is dealt with. It enlists a special hardware architecture, made up of a core digital signal processor connected to a data acquisition board, and takes advantage of a quadrature demodulation-based measurement scheme, already proposed by the authors. Thanks to an efficient measurement process and an optimized implementation of its fundamental stages, the proposed meter succeeds in exploiting all hardware resources in such an effective way as to gain high performance and real-time operation. For input frequencies up to some hundreds of kilohertz, the meter is capable both of updating phase noise power spectrum while seamlessly capturing the analyzed signal into its memory, and granting as good frequency resolution as few units of hertz.
MSTor: A program for calculating partition functions, free energies, enthalpies, entropies, and heat capacities of complex molecules including torsional anharmonicity

NASA Astrophysics Data System (ADS)

Zheng, Jingjing; Mielke, Steven L.; Clarkson, Kenneth L.; Truhlar, Donald G.

2012-08-01

We present a Fortran program package, MSTor, which calculates partition functions and thermodynamic functions of complex molecules involving multiple torsional motions by the recently proposed MS-T method. This method interpolates between the local harmonic approximation in the low-temperature limit, and the limit of free internal rotation of all torsions at high temperature. The program can also carry out calculations in the multiple-structure local harmonic approximation. The program package also includes six utility codes that can be used as stand-alone programs to calculate reduced moment of inertia matrices by the method of Kilpatrick and Pitzer, to generate conformational structures, to calculate, either analytically or by Monte Carlo sampling, volumes for torsional subdomains defined by Voronoi tessellation of the conformational subspace, to generate template input files, and to calculate one-dimensional torsional partition functions using the torsional eigenvalue summation method. Catalogue identifier: AEMF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 77 434 No. of bytes in distributed program, including test data, etc.: 3 264 737 Distribution format: tar.gz Programming language: Fortran 90, C, and Perl Computer: Itasca (HP Linux cluster, each node has two-socket, quad-core 2.8 GHz Intel Xeon X5560 “Nehalem EP” processors), Calhoun (SGI Altix XE 1300 cluster, each node containing two quad-core 2.66 GHz Intel Xeon “Clovertown”-class processors sharing 16 GB of main memory), Koronis (Altix UV 1000 server with 190 6-core Intel Xeon X7542 “Westmere” processors at 2.66 GHz), Elmo (Sun Fire X4600 Linux cluster with AMD Opteron cores), and Mac Pro (two 2.8 GHz Quad-core Intel Xeon processors) Operating system: Linux/Unix/Mac OS RAM: 2 Mbytes Classification: 16.3, 16.12, 23 Nature of problem: Calculation of the partition functions and thermodynamic functions (standard-state energy, enthalpy, entropy, and free energy as functions of temperatures) of complex molecules involving multiple torsional motions. Solution method: The multi-structural approximation with torsional anharmonicity (MS-T). The program also provides results for the multi-structural local harmonic approximation [1]. Restrictions: There is no limit on the number of torsions that can be included in either the Voronoi calculation or the full MS-T calculation. In practice, the range of problems that can be addressed with the present method consists of all multi-torsional problems for which one can afford to calculate all the conformations and their frequencies. Unusual features: The method can be applied to transition states as well as stable molecules. The program package also includes the hull program for the calculation of Voronoi volumes and six utility codes that can be used as stand-alone programs to calculate reduced moment-of-inertia matrices by the method of Kilpatrick and Pitzer, to generate conformational structures, to calculate, either analytically or by Monte Carlo sampling, volumes for torsional subdomain defined by Voronoi tessellation of the conformational subspace, to generate template input files, and to calculate one-dimensional torsional partition functions using the torsional eigenvalue summation method. Additional comments: The program package includes a manual, installation script, and input and output files for a test suite. Running time: There are 24 test runs. The running time of the test runs on a single processor of the Itasca computer is less than 2 seconds. J. Zheng, T. Yu, E. Papajak, I.M. Alecu, S.L. Mielke, D.G. Truhlar, Practical methods for including torsional anharmonicity in thermochemical calculations of complex molecules: The internal-coordinate multi-structural approximation, Phys. Chem. Chem. Phys. 13 (2011) 10885-10907.
In-Orbit Performance of the Digital Electronics for the X-Ray Microcalorimeter Onboard the Hitomi Satellite

NASA Astrophysics Data System (ADS)

Tsujimoto, M.; Tashiro, M. S.; Ishisaki, Y.; Yamada, S.; Seta, H.; Mitsuda, K.; Boyce, K. R.; Eckart, M. E.; Kilbourne, C. A.; Leutenegger, M. A.; Porter, F. S.; Kelley, R. L.

2018-03-01

The pulse shape processor is the onboard digital electronics unit of the X-ray microcalorimeter instrument—the soft X-ray spectrometer—onboard the Hitomi satellite. It processes X-ray events using the optimum filtering with limited resources. It was operated for 36 days in orbit continuously without issues and met the requirement of processing a 150 s^{-1} event rate during the observation of bright sources. Here, we present the results obtained in orbit, focusing on its performance as the onboard digital signal processing unit of an X-ray microcalorimeter.
A Cost Effective System Design Approach for Critical Space Systems

NASA Technical Reports Server (NTRS)

Abbott, Larry Wayne; Cox, Gary; Nguyen, Hai

2000-01-01

NASA-JSC required an avionics platform capable of serving a wide range of applications in a cost-effective manner. In part, making the avionics platform cost effective means adhering to open standards and supporting the integration of COTS products with custom products. Inherently, operation in space requires low power, mass, and volume while retaining high performance, reconfigurability, scalability, and upgradability. The Universal Mini-Controller project is based on a modified PC/104-Plus architecture while maintaining full compatibility with standard COTS PC/104 products. The architecture consists of a library of building block modules, which can be mixed and matched to meet a specific application. A set of NASA developed core building blocks, processor card, analog input/output card, and a Mil-Std-1553 card, have been constructed to meet critical functions and unique interfaces. The design for the processor card is based on the PowerPC architecture. This architecture provides an excellent balance between power consumption and performance, and has an upgrade path to the forthcoming radiation hardened PowerPC processor. The processor card, which makes extensive use of surface mount technology, has a 166 MHz PowerPC 603e processor, 32 Mbytes of error detected and corrected RAM, 8 Mbytes of Flash, and I Mbytes of EPROM, on a single PC/104-Plus card. Similar densities have been achieved with the quad channel Mil-Std-1553 card and the analog input/output cards. The power management built into the processor and its peripheral chip allows the power and performance of the system to be adjusted to meet the requirements of the application, allowing another dimension to the flexibility of the Universal Mini-Controller. Unique mechanical packaging allows the Universal Mini-Controller to accommodate standard COTS and custom oversized PC/104-Plus cards. This mechanical packaging also provides thermal management via conductive cooling of COTS boards, which are typically designed for convection cooling methods.
High-power CO(2) laser with a Gauss-core resonator for high-speed cutting of thin metal sheets.

PubMed

Takenaka, Y; Nishimae, J; Tanaka, M; Motoki, Y

1997-01-01

A novel resonator, the Gauss-core resonator, based on a stable resonator configuration designed to yield a highly focusing beam operating in a large-volume TEM(00) mode, is presented. A 6.2 kW linearly polarized output beam with an M(2) factor of 1.7 is obtained experimentally for a high-power cw CO(2) laser. The capability of the Gauss-core resonator to process laser materials is also studied. We can cut 1-mm-thick mild (soft) steel with a maximum cutting speed of 58 m/min at 5.6 kW and 0.2-mm-thick steel 145 m/min at 2.8 kW.

Ultrasonic Drilling and Coring

NASA Technical Reports Server (NTRS)

Bar-Cohen, Yoseph

1998-01-01

A novel drilling and coring device, driven by a combination, of sonic and ultrasonic vibration, was developed. The device is applicable to soft and hard objects using low axial load and potentially operational under extreme conditions. The device has numerous potential planetary applications. Significant potential for commercialization in construction, demining, drilling and medical technologies.
Effect of a Near Fault on the Seismic Response of a Base-Isolated Structure with a Soft Storey

NASA Astrophysics Data System (ADS)

Athamnia, B.; Ounis, A.; Abdeddaim, M.

2017-12-01

This study focuses on the soft-storey behavior of RC structures with lead core rubber bearing (LRB) isolation systems under near and far-fault motions. Under near-fault ground motions, seismic isolation devices might perform poorly because of large isolator displacements caused by large velocity and displacement pulses associated with such strong motions. In this study, four different structural models have been designed to study the effect of soft-storey behavior under near-fault and far-fault motions. The seismic analysis for isolated reinforced concrete buildings is carried out using a nonlinear time history analysis method. Inter-story drifts, absolute acceleration, displacement, base shear forces, hysteretic loops and the distribution of plastic hinges are examined as a result of the analysis. These results show that the performance of a base isolated RC structure is more affected by increasing the height of a story under nearfault motion than under far-fault motion.
Research on NC motion controller based on SOPC technology

NASA Astrophysics Data System (ADS)

Jiang, Tingbiao; Meng, Biao

2006-11-01

With the rapid development of the digitization and informationization, the application of numerical control technology in the manufacturing industry becomes more and more important. However, the conventional numerical control system usually has some shortcomings such as the poor in system openness, character of real-time, cutability and reconfiguration. In order to solve these problems, this paper investigates the development prospect and advantage of the application in numerical control area with system-on-a-Programmable-Chip (SOPC) technology, and puts forward to a research program approach to the NC controller based on SOPC technology. Utilizing the characteristic of SOPC technology, we integrate high density logic device FPGA, memory SRAM, and embedded processor ARM into a single programmable logic device. We also combine the 32-bit RISC processor with high computing capability of the complicated algorithm with the FPGA device with strong motivable reconfiguration logic control ability. With these steps, we can greatly resolve the defect described in above existing numerical control systems. For the concrete implementation method, we use FPGA chip embedded with ARM hard nuclear processor to construct the control core of the motion controller. We also design the peripheral circuit of the controller according to the requirements of actual control functions, transplant real-time operating system into ARM, design the driver of the peripheral assisted chip, develop the application program to control and configuration of FPGA, design IP core of logic algorithm for various NC motion control to configured it into FPGA. The whole control system uses the concept of modular and structured design to develop hardware and software system. Thus the NC motion controller with the advantage of easily tailoring, highly opening, reconfigurable, and expandable can be implemented.
Thermo-mechanical properties of carbon nanotubes and applications in thermal management

NASA Astrophysics Data System (ADS)

Nguyen, Manh Hong; Thang Bui, Hung; Trinh Pham, Van; Phan, Ngoc Hong; Nguyen, Tuan Hong; Chuc Nguyen, Van; Quang Le, Dinh; Khoi Phan, Hong; Phan, Ngoc Minh

2016-06-01

Thanks to their very high thermal conductivity, high Young’s modulus and unique tensile strength, carbon nanotubes (CNTs) have become one of the most suitable nano additives for heat conductive materials. In this work, we present results obtained for the synthesis of heat conductive materials containing CNT based thermal greases, nanoliquids and lubricating oils. These synthesized heat conductive materials were applied to thermal management for high power electronic devices (CPUs, LEDs) and internal combustion engines. The simulation and experimental results on thermal greases for an Intel Pentium IV processor showed that the thermal conductivity of greases increases 1.4 times and the saturation temperature of the CPU decreased by 5 °C by using thermal grease containing 2 wt% CNTs. Nanoliquids containing CNT based distilled water/ethylene glycol were successfully applied in heat dissipation for an Intel Core i5 processor and a 450 W floodlight LED. The experimental results showed that the saturation temperature of the Intel Core i5 processor and the 450 W floodlight LED decreased by about 6 °C and 3.5 °C, respectively, when using nanoliquids containing 1 g l-1 of CNTs. The CNTs were also effectively utilized additive materials for the synthesis of lubricating oils to improve the thermal conductivity, heat dissipation efficiency and performance efficiency of engines. The experimental results show that the thermal conductivity of lubricating oils increased by 12.5%, the engine saved 15% fuel consumption, and the longevity of the lubricating oil increased up to 20 000 km by using 0.1% vol. CNTs in the lubricating oils. All above results have confirmed the tremendous application potential of heat conductive materials containing CNTs in thermal management for high power electronic devices, internal combustion engines and other high power apparatus.
Advanced Wireless Integrated Navy Network - AWINN

DTIC Science & Technology

2005-09-30

progress report No. 3 on AWINN hardware and software configurations of smart , wideband, multi-function antennas, secure configurable platform, close-in...results to the host PC via a UART soft core. The UART core used is a proprietary Xilinx core which incorporates features described in National...current software uses wheel odometry and visual landmarks to create a map and estimate position on an internal x, y grid . The wheel odometry provides a
Design of a hybrid battery charger system fed by a wind-turbine and photovoltaic power generators.

PubMed

Chang Chien, Jia-Ren; Tseng, Kuo-Ching; Yan, Bo-Yi

2011-03-01

This paper is aimed to develop a digital signal processor (DSP) for controlling a solar cell and wind-turbine hybrid charging system. The DSP consists of solar cells, a wind turbine, a lead acid battery, and a buck-boost converter. The solar cells and wind turbine serve as the system's main power sources and the battery as an energy storage element. The output powers of solar cells and wind turbine have large fluctuations with the weather and climate conditions. These unstable powers can be adjusted by a buck-boost converter and thus the most suitable output powers can be obtained. This study designs a booster by using a dsPIC30F4011 digital signal controller as a core processor. The DSP is controlled by the perturbation and observation methods to obtain an effective energy circuit with a full 100 W charging system. Also, this DSP can, day and night, be easily controlled and charged by a simple program, which can change the state of the system to reach a flexible application based on the reading weather conditions.
Development of 3-Year Roadmap to Transform the Discipline of Systems Engineering

DTIC Science & Technology

2010-03-31

quickly humans could physically construct them. Indeed, magnetic core memory was entirely constructed by human hands until it was superseded by...For their mainframe computers, IBM develops the applications, operating system, computer hardware and microprocessors (off the shelf standard memory ...processor developers work on potential computational and memory pipelines to support the required performance capabilities and use the available transistors
Scaling Support Vector Machines On Modern HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2015-02-01

We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
Successful development of first-generation laser device; marking China's optoelectronic technology at world class level

NASA Astrophysics Data System (ADS)

1995-04-01

Bell Laboratories has developed the world's first optical information processor. Its core device is a self-excited electrooptical effect apparatus array of symmetric operation. After being developed in the United States, this high-technology device was successfully developed by China's scientists,thus making the fact that China's optoelectronic technology is among the most advanced in the world.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Kwan-Liu

Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations willmore » prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale computer. This project concludes with a functional prototype containing pervasively parallel algorithms that perform demonstratively well on many-core processors. These algorithms are fundamental for performing data analysis and visualization at extreme scale.« less
Formation and relaxation of quasistationary states in particle systems with power-law interactions

NASA Astrophysics Data System (ADS)

Marcos, B.; Gabrielli, A.; Joyce, M.

2017-09-01

We explore the formation and relaxation of the so-called quasistationary states (QSS) for particle distributions in three dimensions interacting via an attractive radial pair potential V (r →∞ ) ˜1 /rγ with γ >0 , and either a soft core or hard core regularization at small r . In the first part of the paper, we generalize, for any spatial dimension d ≥2 , Chandrasekhar's approach for the case of gravity to obtain analytic estimates of the rate of collisional relaxation due to two-body collisions. The resultant relaxation rates indicate an essential qualitative difference depending on the integrability of the pair force at large distances: for γ >d -1 , the rate diverges in the large particle number N (mean-field) limit, unless a sufficiently large soft core is present; for γ
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

NASA Astrophysics Data System (ADS)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.
Ray Meta: scalable de novo metagenome assembly and profiling

PubMed Central

2012-01-01

Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net. PMID:23259615
Predicting plasticity with soft vibrational modes: from dislocations to glasses.

PubMed

Rottler, Jörg; Schoenholz, Samuel S; Liu, Andrea J

2014-04-01

We show that quasilocalized low-frequency modes in the vibrational spectrum can be used to construct soft spots, or regions vulnerable to rearrangement, which serve as a universal tool for the identification of flow defects in solids. We show that soft spots not only encode spatial information, via their location, but also directional information, via directors for particles within each soft spot. Single crystals with isolated dislocations exhibit low-frequency phonon modes that localize at the core, and their polarization pattern predicts the motion of atoms during elementary dislocation glide in two and three dimensions in exquisite detail. Even in polycrystals and disordered solids, we find that the directors associated with particles in soft spots are highly correlated with the direction of particle displacements in rearrangements.
Performance verification and system integration tests of the pulse shape processor for the soft x-ray spectrometer onboard ASTRO-H

NASA Astrophysics Data System (ADS)

Takeda, Sawako; Tashiro, Makoto S.; Ishisaki, Yoshitaka; Tsujimoto, Masahiro; Seta, Hiromi; Shimoda, Yuya; Yamaguchi, Sunao; Uehara, Sho; Terada, Yukikatsu; Fujimoto, Ryuichi; Mitsuda, Kazuhisa

2014-07-01

The soft X-ray spectrometer (SXS) aboard ASTRO-H is equipped with dedicated digital signal processing units called pulse shape processors (PSPs). The X-ray microcalorimeter system SXS has 36 sensor pixels, which are operated at 50 mK to measure heat input of X-ray photons and realize an energy resolution of 7 eV FWHM in the range 0.3-12.0 keV. Front-end signal processing electronics are used to filter and amplify the electrical pulse output from the sensor and for analog-to-digital conversion. The digitized pulses from the 36 pixels are multiplexed and are sent to the PSP over low-voltage differential signaling lines. Each of two identical PSP units consists of an FPGA board, which assists the hardware logic, and two CPU boards, which assist the onboard software. The FPGA board triggers at every pixel event and stores the triggering information as a pulse waveform in the installed memory. The CPU boards read the event data to evaluate pulse heights by an optimal filtering algorithm. The evaluated X-ray photon data (including the pixel ID, energy, and arrival time information) are transferred to the satellite data recorder along with event quality information. The PSP units have been developed and tested with the engineering model (EM) and the flight model. Utilizing the EM PSP, we successfully verified the entire hardware system and the basic software design of the PSPs, including their communication capability and signal processing performance. In this paper, we show the key metrics of the EM test, such as accuracy and synchronicity of sampling clocks, event grading capability, and resultant energy resolution.
Autonomous Soft Robotic Fish Capable of Escape Maneuvers Using Fluidic Elastomer Actuators.

PubMed

Marchese, Andrew D; Onal, Cagdas D; Rus, Daniela

2014-03-01

In this work we describe an autonomous soft-bodied robot that is both self-contained and capable of rapid, continuum-body motion. We detail the design, modeling, fabrication, and control of the soft fish, focusing on enabling the robot to perform rapid escape responses. The robot employs a compliant body with embedded actuators emulating the slender anatomical form of a fish. In addition, the robot has a novel fluidic actuation system that drives body motion and has all the subsystems of a traditional robot onboard: power, actuation, processing, and control. At the core of the fish's soft body is an array of fluidic elastomer actuators. We design the fish to emulate escape responses in addition to forward swimming because such maneuvers require rapid body accelerations and continuum-body motion. These maneuvers showcase the performance capabilities of this self-contained robot. The kinematics and controllability of the robot during simulated escape response maneuvers are analyzed and compared with studies on biological fish. We show that during escape responses, the soft-bodied robot has similar input-output relationships to those observed in biological fish. The major implication of this work is that we show soft robots can be both self-contained and capable of rapid body motion.
Evaluation of the Xeon phi processor as a technology for the acceleration of real-time control in high-order adaptive optics systems

NASA Astrophysics Data System (ADS)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah; Vick, Andy; Schnetler, Hermine

2014-08-01

We present wavefront reconstruction acceleration of high-order AO systems using an Intel Xeon Phi processor. The Xeon Phi is a coprocessor providing many integrated cores and designed for accelerating compute intensive, numerical codes. Unlike other accelerator technologies, it allows virtually unchanged C/C++ to be recompiled to run on the Xeon Phi, giving the potential of making development, upgrade and maintenance faster and less complex. We benchmark the Xeon Phi in the context of AO real-time control by running a matrix vector multiply (MVM) algorithm. We investigate variability in execution time and demonstrate a substantial speed-up in loop frequency. We examine the integration of a Xeon Phi into an existing RTC system and show that performance improvements can be achieved with limited development effort.
Performance Evaluation of an Intel Haswell- and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

NASA Technical Reports Server (NTRS)

Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John

2016-01-01

We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.
Salt influence on surface microorganisms and ripening of soft ewe cheese.

PubMed

Tabla, Rafael; Gómez, Antonia; Rebollo, José E; Roa, Isidro

2015-05-01

The effect of different brining treatments on salt uptake and diffusion during the first 30 d of ripening was determined in soft ewe cheese. Additionally, salt influence on surface microorganisms and physicochemical parameters was evaluated. Cheeses were placed into different brine solutions (14, 18 and 24°Bé) at 5 and 10 °C for 1, 2 or 3 h. Samples from rind, outer core and inner core were analysed at 0, 7, 15 and 30 d. Complete salt diffusion from rind to the inner core took about 15 d. The resulting salt gradient favoured the development of a pH gradient from the surface to the inner core. Salt concentration also had a significant effect on the growth of surface microorganisms (mesophiles, pseudomonads and halotolerants). However, mould and yeasts were not affected throughout ripening by the salt levels achieved. Brine salting by immersion for 3 h at 10 °C in 24°B brine was found to be the most suitable treatment to control pseudomonads in cheese rind, as spoilage microorganism.
A Programming Model Performance Study Using the NAS Parallel Benchmarks

DOE PAGES

Shan, Hongzhang; Blagojević, Filip; Min, Seung-Jai; ...

2010-01-01

Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the threemore » programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.« less

EIT Crinkles as Evidence for the Breakout Model of Solar Eruptions

NASA Technical Reports Server (NTRS)

Sterling, Alphonse C.; Moore, Ronald L.

2001-01-01

We present observations of two homologous flares in NOAA Active Region 8210 occurring on 1998 May 1 and 2, using EUV data from the EUV Imaging Telescope (EIT) on board the Solar and Heliospheric Observatory, high-resolution and high-time cadence images from the soft X-ray telescope on Yohkoh, images or fluxes from the hard X-ray telescope on Yohkoh and the BATSE experiment on board the Compton Gamma Ray Observatory, and Ca(XIX) soft X-ray spectra from the Bragg crystal spectrometer (BCS) on Yohkoh. Magnetograms indicate that the flares occurred in a complex magnetic topology, consisting of an emerging flux region (EFR) sandwiched between a sunspot to the west and a coronal hole to the east. In an earlier study we found that in EIT images, both flaring episodes showed the formation of a crinkle-like pattern of emission (EIT crinkles) occurring in the coronal hole vicinity, well away from a central 'core field' area near the EFR-sunspot boundary. With our expanded data set, here we find that most of the energetic activity occurs in the core region in both events, with some portions of the core brightening shortly after the onset of the EIT crinkles, and other regions of the core brightening several minutes later, coincident with a burst of hard X-rays; there are no obvious core brightenings prior to the onset of the EIT crinkles. These timings are consistent with the 'breakout model' of solar eruptions, whereby the emerging flux is initially constrained by a system of overlying magnetic field lines, and is able to erupt only after an opening develops in the overlying fields as a consequence of magnetic reconnection at a magnetic null point. In our case, the EIT crinkles would be a signature of this pre-impulsive phase magnetic reconnection, and brightening of the core only occurs after the core fields begin to escape through the newly created opening in the overlying fields. Morphology in soft X-ray images and properties in hard X-rays differ between the two events, with complexities that preclude a simple determination of the dynamics in the core at the times of eruption. From the BCS spectra, we find that the core region expends energy at a rate of approximately 10(exp 26) ergs/s during the time of the growth of the EIT crinkles; this rate is an upper limit to energy expended in the reconnections opening the overlying fields. Energy losses occur at an order of magnitude higher rate near the time of the peak of the events. There is little evidence of asymmetry in the spectra, consistent with the majority of the mass flows occurring normal to the line of sight. Both events have similar electron temperature dependencies on time.
Architecture of security management unit for safe hosting of multiple agents

NASA Astrophysics Data System (ADS)

Gilmont, Tanguy; Legat, Jean-Didier; Quisquater, Jean-Jacques

1999-04-01

In such growing areas as remote applications in large public networks, electronic commerce, digital signature, intellectual property and copyright protection, and even operating system extensibility, the hardware security level offered by existing processors is insufficient. They lack protection mechanisms that prevent the user from tampering critical data owned by those applications. Some devices make exception, but have not enough processing power nor enough memory to stand up to such applications (e.g. smart cards). This paper proposes an architecture of secure processor, in which the classical memory management unit is extended into a new security management unit. It allows ciphered code execution and ciphered data processing. An internal permanent memory can store cipher keys and critical data for several client agents simultaneously. The ordinary supervisor privilege scheme is replaced by a privilege inheritance mechanism that is more suited to operating system extensibility. The result is a secure processor that has hardware support for extensible multitask operating systems, and can be used for both general applications and critical applications needing strong protection. The security management unit and the internal permanent memory can be added to an existing CPU core without loss of performance, and do not require it to be modified.
A parallel implementation of an off-lattice individual-based model of multicellular populations

NASA Astrophysics Data System (ADS)

Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe

2015-07-01

As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
The CMS Level-1 Calorimeter Trigger for LHC Run II

NASA Astrophysics Data System (ADS)

Sinthuprasith, Tutanon

2017-01-01

The phase-1 upgrades of the CMS Level-1 calorimeter trigger have been completed. The Level-1 trigger has been fully commissioned and it will be used by CMS to collect data starting from the 2016 data run. The new trigger has been designed to improve the performance at high luminosity and large number of simultaneous inelastic collisions per crossing (pile-up). For this purpose it uses a novel design, the Time Multiplexed Design, which enables the data from an event to be processed by a single trigger processor at full granularity over several bunch crossings. The TMT design is a modular design based on the uTCA standard. The architecture is flexible and the number of trigger processors can be expanded according to the physics needs of CMS. Intelligent, more complex, and innovative algorithms are now the core of the first decision layer of CMS: the upgraded trigger system implements pattern recognition and MVA (Boosted Decision Tree) regression techniques in the trigger processors for pT assignment, pile up subtraction, and isolation requirements for electrons, and taus. The performance of the TMT design and the latency measurements and the algorithm performance which has been measured using data is also presented here.
Equalizer: a scalable parallel rendering framework.

PubMed

Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

2009-01-01

Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.
Fabrication of an Fe80.5Si7.5B6Nb5Cu Amorphous-Nanocrystalline Powder Core with Outstanding Soft Magnetic Properties

NASA Astrophysics Data System (ADS)

Zhang, Zongyang; Liu, Xiansong; Feng, Shuangjiu; Rehman, Khalid Mehmood Ur

2018-03-01

In this study, the melt spinning method was used to develop Fe80.5Si7.5B6Nb5Cu amorphous ribbons in the first step. Then, the Fe80.5Si7.5B6Nb5Cu amorphous-nanocrystalline core with a compact microstructure was obtained by multiple processes. The main properties of the magnetic powder core, such as micromorphology, thermal behavior, permeability, power loss and quality factor, have been analyzed. The obtained results show that an Fe80.5Si7.5B6Nb5Cu amorphous-nanocrystalline duplex core has high permeability (54.8-57), is relatively stable at different frequencies and magnetic fields, and the maximum power loss is only 313 W/kg; furthermore, it has a good quality factor.
Realization of a single image haze removal system based on DaVinci DM6467T processor

NASA Astrophysics Data System (ADS)

Liu, Zhuang

2014-10-01

Video monitoring system (VMS) has been extensively applied in domains of target recognition, traffic management, remote sensing, auto navigation and national defence. However the VMS has a strong dependence on the weather, for instance, in foggy weather, the quality of images received by the VMS are distinct degraded and the effective range of VMS is also decreased. All in all, the VMS performs terribly in bad weather. Thus the research of fog degraded images enhancement has very high theoretical and practical application value. A design scheme of a fog degraded images enhancement system based on the TI DaVinci processor is presented in this paper. The main function of the referred system is to extract and digital cameras capture images and execute image enhancement processing to obtain a clear image. The processor used in this system is the dual core TI DaVinci DM6467T - ARM@500MHz+DSP@1GH. A MontaVista Linux operating system is running on the ARM subsystem which handles I/O and application processing. The DSP handles signal processing and the results are available to the ARM subsystem in shared memory.The system benefits from the DaVinci processor so that, with lower power cost and smaller volume, it provides the equivalent image processing capability of a X86 computer. The outcome shows that the system in this paper can process images at 25 frames per second on D1 resolution.
Core drilling apparatus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gusman, M.T.; Konstantinov, L.P.; Malkin, B.D.

1974-04-16

Mounted on the exterior of a nonrotatable core barrel is an end of a resilient tape, the other end of which extends inward into the barrel and is connected to a device for pulling the tape inward into the barrel. The apparatus also is provided with an arrangement which forms a sleeve from the tape as this is being pulled into the core barrel. During the coring operation, the tape is being pulled inward into the barrel and a sleeve is formed from the tape with the aid of the arrangement to encase and protect the core from disturbance. Themore » coring apparatus is intended for core drilling in soft, unconsolidated, and fractured formations. (3 claims)« less
A distributed microcomputer-controlled system for data acquisition and power spectral analysis of EEG.

PubMed

Vo, T D; Dwyer, G; Szeto, H H

1986-04-01

A relatively powerful and inexpensive microcomputer-based system for the spectral analysis of the EEG is presented. High resolution and speed is achieved with the use of recently available large-scale integrated circuit technology with enhanced functionality (INTEL Math co-processors 8087) which can perform transcendental functions rapidly. The versatility of the system is achieved with a hardware organization that has distributed data acquisition capability performed by the use of a microprocessor-based analog to digital converter with large resident memory (Cyborg ISAAC-2000). Compiled BASIC programs and assembly language subroutines perform on-line or off-line the fast Fourier transform and spectral analysis of the EEG which is stored as soft as well as hard copy. Some results obtained from test application of the entire system in animal studies are presented.
Theoretical and Experimental Studies of Epidermal Heat Flux Sensors for Measurements of Core Body Temperature.

PubMed

Zhang, Yihui; Webb, Richard Chad; Luo, Hongying; Xue, Yeguang; Kurniawan, Jonas; Cho, Nam Heon; Krishnan, Siddharth; Li, Yuhang; Huang, Yonggang; Rogers, John A

2016-01-07

Long-term, continuous measurement of core body temperature is of high interest, due to the widespread use of this parameter as a key biomedical signal for clinical judgment and patient management. Traditional approaches rely on devices or instruments in rigid and planar forms, not readily amenable to intimate or conformable integration with soft, curvilinear, time-dynamic, surfaces of the skin. Here, materials and mechanics designs for differential temperature sensors are presented which can attach softly and reversibly onto the skin surface, and also sustain high levels of deformation (e.g., bending, twisting, and stretching). A theoretical approach, together with a modeling algorithm, yields core body temperature from multiple differential measurements from temperature sensors separated by different effective distances from the skin. The sensitivity, accuracy, and response time are analyzed by finite element analyses (FEA) to provide guidelines for relationships between sensor design and performance. Four sets of experiments on multiple devices with different dimensions and under different convection conditions illustrate the key features of the technology and the analysis approach. Finally, results indicate that thermally insulating materials with cellular structures offer advantages in reducing the response time and increasing the accuracy, while improving the mechanics and breathability. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Robust Control of a Cable-Driven Soft Exoskeleton Joint for Intrinsic Human-Robot Interaction.

PubMed

Jarrett, C; McDaid, A J

2017-07-01

A novel, cable-driven soft joint is presented for use in robotic rehabilitation exoskeletons to provide intrinsic, comfortable human-robot interaction. The torque-displacement characteristics of the soft elastomeric core contained within the joint are modeled. This knowledge is used in conjunction with a dynamic system model to derive a sliding mode controller (SMC) to implement low-level torque control of the joint. The SMC controller is experimentally compared with a baseline feedback-linearised proportional-derivative controller across a range of conditions and shown to be robust to un-modeled disturbances. The torque controller is then tested with six healthy subjects while they perform a selection of activities of daily living, which has validated its range of performance. Finally, a case study with a participant with spastic cerebral palsy is presented to illustrate the potential of both the joint and controller to be used in a physiotherapy setting to assist clinical populations.
Shape evolution of a core-shell spherical particle under hydrostatic pressure.

PubMed

Colin, Jérôme

2012-03-01

The morphological evolution by surface diffusion of a core-shell spherical particle has been investigated theoretically under hydrostatic pressure when the shear modulii of the core and shell are different. A linear stability analysis has demonstrated that depending on the pressure, shear modulii, and radii of both phases, the free surface of the composite particle may be unstable with respect to a shape perturbation. A stability diagram finally emphasizes that the roughness development is favored in the case of a hard shell with a soft core.
High Performance Computing Assets for Ocean Acoustics Research

DTIC Science & Technology

2016-11-18

independently on processing units with access to a typically available amount of memory, say 16 or 32 gigabytes. Our models require each processor to...allow results to be obtained with limited amounts of memory available to individual processing units (with no time frame for successful completion...put into use. One file server computer to store simulation output has also been purchased. The first workstation has 28 CPU cores, dual- thread , (56
A Reusable and Adaptable Software Architecture for Embedded Space Flight System: The Core Flight Software System (CFS)

NASA Technical Reports Server (NTRS)

Wilmot, Jonathan

2005-01-01

The contents include the following: High availability. Hardware is in harsh environment. Flight processor (constraints) very widely due to power and weight constraints. Software must be remotely modifiable and still operate while changes are being made. Many custom one of kind interfaces for one of a kind missions. Sustaining engineering. Price of failure is high, tens to hundreds of millions of dollars.
Visual Media Reasoning - Terrain-based Geolocation

DTIC Science & Technology

2015-06-01

the drawings, specifications, or other data does not license the holder or any other person or corporation ; or convey any rights or permission to...3.4 Alternative Metric Investigation This section describes a graphics processor unit (GPU) based implementation in the NVIDIA CUDA programming...utilizing 2 concurrent CPU cores, each controlling a single Nvidia C2075 Tesla Fermi CUDA card. Figure 22 shows a comparison of the CPU and the GPU powered
Comparison of Communication Architectures and Network Topologies for Distributed Propulsion Controls (Preprint)

DTIC Science & Technology

2013-05-01

logic to perform control function computations and are connected to the full authority digital engine control ( FADEC ) via a high-speed data...Digital Engine Control ( FADEC ) via a high speed data communication bus. The short term distributed engine control configu- rations will be core...concen- trator; and high temperature electronics, high speed communication bus between the data concentrator and the control law processor master FADEC
Core needle biopsy of soft tissue tumors, CEUS vs US guided: a pilot study.

PubMed

Coran, Alessandro; Di Maggio, Antonio; Rastrelli, Marco; Alberioli, Enrico; Attar, Shady; Ortolan, Paolo; Bortolanza, Carlo; Tosi, Annalisa; Montesco, Maria Cristina; Bezzon, Elisabetta; Rossi, Carlo Riccardo; Stramare, Roberto

2015-12-01

The purpose of this study was to evaluate the usefulness of contrast-enhanced ultrasonography (CEUS) in the bioptic sampling of soft tissue tumors (STT) compared with unenhanced ultrasonography alone. This is a prospective longitudinal study of 40 patients subjected to ultrasonography (US)-guided core needle biopsy (CNB) to characterize a suspected STT. Three series of bioptic samplings were carried out on each patient, respectively using unenhanced US alone and CEUS in both the areas of the tumor enhanced or not by the contrast medium. All bioptic samples underwent a histological evaluation and the results were analyzed by comparing the histology of the biopsy with the definitive diagnosis in 15 surgically excised samples. 27 (67.5 %) of the 40 patients completed the entire study procedure; in 19 cases (70.3 %) the three bioptic samplings gave unanimous results, also when compared to the surgical specimen; in seven cases (25.9 %) use of CEUS allowed to obtain additional or more accurate information about the mass in question, compared to simple US guidance without contrast; in one patient (3.7 %) sampling obtained using unenhanced ultrasonography guidance and in the areas enhanced by the contrast agent had precisely the same results of the surgical specimen. CEUS, due to its ability to evaluate microvascular areas, has proven to be a promising method in guiding bioptic sampling of soft tissue tumor, directing the needle to the most significant areas of the tumor. Given the small number of patients evaluated in our study, to achieve statistically significant results, it would be appropriate to obtain a larger sample size, since the very first results seem to be encouraging and to justify the increase of the population.
Multi-threaded ATLAS simulation on Intel Knights Landing processors

NASA Astrophysics Data System (ADS)

Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

2017-10-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
LIBS data analysis using a predictor-corrector based digital signal processor algorithm

NASA Astrophysics Data System (ADS)

Sanders, Alex; Griffin, Steven T.; Robinson, Aaron

2012-06-01

There are many accepted sensor technologies for generating spectra for material classification. Once the spectra are generated, communication bandwidth limitations favor local material classification with its attendant reduction in data transfer rates and power consumption. Transferring sensor technologies such as Cavity Ring-Down Spectroscopy (CRDS) and Laser Induced Breakdown Spectroscopy (LIBS) require effective material classifiers. A result of recent efforts has been emphasis on Partial Least Squares - Discriminant Analysis (PLS-DA) and Principle Component Analysis (PCA). Implementation of these via general purpose computers is difficult in small portable sensor configurations. This paper addresses the creation of a low mass, low power, robust hardware spectra classifier for a limited set of predetermined materials in an atmospheric matrix. Crucial to this is the incorporation of PCA or PLS-DA classifiers into a predictor-corrector style implementation. The system configuration guarantees rapid convergence. Software running on multi-core Digital Signal Processor (DSPs) simulates a stream-lined plasma physics model estimator, reducing Analog-to-Digital (ADC) power requirements. This paper presents the results of a predictorcorrector model implemented on a low power multi-core DSP to perform substance classification. This configuration emphasizes the hardware system and software design via a predictor corrector model that simultaneously decreases the sample rate while performing the classification.
Design and optimization of a portable LQCD Monte Carlo code using OpenACC

NASA Astrophysics Data System (ADS)

Bonati, Claudio; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Calore, Enrico; Schifano, Sebastiano Fabio; Silvi, Giorgio; Tripiccione, Raffaele

The present panorama of HPC architectures is extremely heterogeneous, ranging from traditional multi-core CPU processors, supporting a wide class of applications but delivering moderate computing performance, to many-core Graphics Processor Units (GPUs), exploiting aggressive data-parallelism and delivering higher performances for streaming computing applications. In this scenario, code portability (and performance portability) become necessary for easy maintainability of applications; this is very relevant in scientific computing where code changes are very frequent, making it tedious and prone to error to keep different code versions aligned. In this work, we present the design and optimization of a state-of-the-art production-level LQCD Monte Carlo application, using the directive-based OpenACC programming model. OpenACC abstracts parallel programming to a descriptive level, relieving programmers from specifying how codes should be mapped onto the target architecture. We describe the implementation of a code fully written in OpenAcc, and show that we are able to target several different architectures, including state-of-the-art traditional CPUs and GPUs, with the same code. We also measure performance, evaluating the computing efficiency of our OpenACC code on several architectures, comparing with GPU-specific implementations and showing that a good level of performance-portability can be reached.

Communications Processor Operating System Study. Executive Summary,

DTIC Science & Technology

1980-11-01

AD-A095 b36 ROME AIR DEVELOPMENT CENTER GRIFFISS AFB NY F/e 17/2 COMMUNICATIONS PROCESSOR OPERATING SYSTEM STUDY. EXECUTIVE SUMM—ETC(U) NOV 80 J...COMMUNICATIONS PROCESSOR OPERATING SYSTEM STUDY Julian Gitlih SPTIC ELECTE«^ FEfi 2 6 1981^ - E APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED "a O...Subtitle) EXECUTIVE^SUMMARY 0F> COMMUNICATIONS PROCESSOR OPERATING SYSTEM $t - • >X W tdLl - ’•• • 7 AUTHORf«! ! , Julian
Factors affecting the microstructure and mechanical properties of Ti-Al3Ti core-shell-structured particle-reinforced Al matrix composites

NASA Astrophysics Data System (ADS)

Guo, Baisong; Yi, Jianhong; Ni, Song; Shen, Rujuan; Song, Min

2016-04-01

This work studied the effects of matrix powder and sintering temperature on the microstructure and mechanical properties of in situ formed Ti-Al3Ti core-shell-structured particle-reinforced pure Al-based composites. It has been shown that both factors have significant effects on the morphology of the reinforcements and densification behaviour of the composites. Due to the strong interfacial bonding and the limitation of the crack propagation in the intermetallic shell during deformation by soft Al matrix and Ti core, the composite fabricated using fine spherical-shaped Al powder and sintered at 570 °C for 5 h has the optimal combination of the overall mechanical properties. The study provides a direction for the optimum combination of high strength and ductility of the composites by adjusting the fabrication parameters.
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

NASA Technical Reports Server (NTRS)

Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

2017-01-01

The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.
Low-Power Embedded DSP Core for Communication Systems

NASA Astrophysics Data System (ADS)

Tsao, Ya-Lan; Chen, Wei-Hao; Tan, Ming Hsuan; Lin, Maw-Ching; Jou, Shyh-Jye

2003-12-01

This paper proposes a parameterized digital signal processor (DSP) core for an embedded digital signal processing system designed to achieve demodulation/synchronization with better performance and flexibility. The features of this DSP core include parameterized data path, dual MAC unit, subword MAC, and optional function-specific blocks for accelerating communication system modulation operations. This DSP core also has a low-power structure, which includes the gray-code addressing mode, pipeline sharing, and advanced hardware looping. Users can select the parameters and special functional blocks based on the character of their applications and then generating a DSP core. The DSP core has been implemented via a cell-based design method using a synthesizable Verilog code with TSMC 0.35[InlineEquation not available: see fulltext.]m SPQM and 0.25[InlineEquation not available: see fulltext.]m 1P5M library. The equivalent gate count of the core area without memory is approximately 50 k. Moreover, the maximum operating frequency of a[InlineEquation not available: see fulltext.] version is 100 MHz (0.35[InlineEquation not available: see fulltext.]m) and 140 MHz (0.25[InlineEquation not available: see fulltext.]m).
Experiments with a small behaviour controlled planetary rover

NASA Technical Reports Server (NTRS)

Miller, David P.; Desai, Rajiv S.; Gat, Erann; Ivlev, Robert; Loch, John

1993-01-01

A series of experiments that were performed on the Rocky 3 robot is described. Rocky 3 is a small autonomous rover capable of navigating through rough outdoor terrain to a predesignated area, searching that area for soft soil, acquiring a soil sample, and depositing the sample in a container at its home base. The robot is programmed according to a reactive behavior control paradigm using the ALFA programming language. This style of programming produces robust autonomous performance while requiring significantly less computational resources than more traditional mobile robot control systems. The code for Rocky 3 runs on an eight bit processor and uses about ten k of memory.
Dissipation and Rheology of Sheared Soft-Core Frictionless Disks Below Jamming

NASA Astrophysics Data System (ADS)

Vâgberg, Daniel; Olsson, Peter; Teitel, S.

2014-05-01

We use numerical simulations to investigate the effect that different models of energy dissipation have on the rheology of soft-core frictionless disks, below jamming in two dimensions. We find that it is not necessarily the mass of the particles that determines whether a system has Bagnoldian or Newtonian rheology, but rather the presence or absence of large connected clusters of particles. We demonstrate the key role that tangential dissipation plays in the formation of such clusters and in several models find a transition from Bagnoldian to Newtonian rheology as the packing fraction ϕ is varied. For each model, we show that appropriately scaled rheology curves approach a well defined limit as the mass of the particles decreases and collisions become strongly inelastic.
Development of suspended core soft glass fibers for far-detuned parametric conversion

NASA Astrophysics Data System (ADS)

Rampur, Anupamaa; Ciąćka, Piotr; Cimek, Jarosław; Kasztelanic, Rafał; Buczyński, Ryszard; Klimczak, Mariusz

2018-04-01

Light sources utilizing χ (2) parametric conversion combine high brightness with attractive operation wavelengths in the near and mid-infrared. In optical fibers, it is possible to use χ (3) degenerate four-wave mixing in order to obtain signal-to-idler frequency detuning of over 100 THz. We report on a test series of nonlinear soft glass suspended core fibers intended for parametric conversion of 1000-1100 nm signal wavelengths available from an array of mature lasers into the near-to-mid-infrared range of 2700-3500 nm under pumping with an erbium sub-picosecond laser system. The presented discussion includes modelling of the fiber properties, details of their physical development and characterization, and experimental tests of parametric conversion.
Dynamic Voltage-Frequency and Workload Joint Scaling Power Management for Energy Harvesting Multi-Core WSN Node SoC

PubMed Central

Li, Xiangyu; Xie, Nijie; Tian, Xinyue

2017-01-01

This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430), and that it can make a system do more valuable works and make more than 99.9% use of the power budget. PMID:28208730
Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax–Wendroff Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2014-07-18

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
Dynamic Voltage-Frequency and Workload Joint Scaling Power Management for Energy Harvesting Multi-Core WSN Node SoC.

PubMed

Li, Xiangyu; Xie, Nijie; Tian, Xinyue

2017-02-08

This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430), and that it can make a system do more valuable works and make more than 99.9% use of the power budget.
Gamma thermometer based reactor core liquid level detector

DOEpatents

Burns, Thomas J.

1983-01-01

A system is provided which employs a modified gamma thermometer for determining the liquid coolant level within a nuclear reactor core. The gamma thermometer which normally is employed to monitor local core heat generation rate (reactor power), is modified by thermocouple junctions and leads to obtain an unambiguous indication of the presence or absence of coolant liquid at the gamma thermometer location. A signal processor generates a signal based on the thermometer surface heat transfer coefficient by comparing the signals from the thermocouples at the thermometer location. The generated signal is a direct indication of loss of coolant due to the change in surface heat transfer when coolant liquid drops below the thermometer location. The loss of coolant indication is independent of reactor power at the thermometer location. Further, the same thermometer may still be used for the normal power monitoring function.
Multimodality management of soft tissue tumors in the extremity

PubMed Central

Crago, Aimee M.; Lee, Ann Y.

2016-01-01

Most extremity soft tissue sarcomas present as a painless mass. Workup should generally involve cross-sectional imaging with MRI, as well as a core biopsy for pathologic diagnosis. Limb-sparing surgery is the standard of care, and may be supplemented with radiation for histologic subtypes at higher risk for local recurrence and chemotherapy for those at higher risk for distant metastases. This article reviews the work-up and surgical approach to extremity soft tissue sarcomas, as well as the role for radiation and chemotherapy, with particular attention given to the distinguishing characteristics of some of the most common subtypes. PMID:27542637
Continuous-annealing method for producing a flexible, curved, soft magnetic amorphous alloy ribbon

NASA Astrophysics Data System (ADS)

Francoeur, Bruno; Couture, Pierre

2012-04-01

A method has been developed for continuous annealing of an amorphous alloy ribbon moving forward at several meters per second, giving a curved shape to the ribbon that remains flexible afterward and can be easily wound into a toroidal core with excellent soft magnetic properties. A heat pulse was applied by a compact system on a Metglas 2605HB1 ribbon moving forward at 5 m/s to initiate a thermal treatment at 460 °C, near crystallization onset. The treatment duration was less than 0.1 s, and the heating and cooling rates were above 10 000 °C/s, which helped preserve most of the alloy as-cast ductility state. Such high temperature rates were achieved by forcing a static contact between the moving ribbon and a temperature-controlled roller. A tensile stress and a series of bending configurations were applied on the moving ribbon during the treatment to induce the development of magnetic anisotropy and to obtain the desired natural curvature radius. The core losses at 60 Hz of a toroidal test core wound with the resulting ribbon are lower than the specific values reported by the alloy manufacturer. This method can be implemented at the casting plant for supplying a low-cost, ready-to-use ribbon, easy to handle and cut, for mass production of toroidal cores for distribution transformer kernels (core and coil only), pulse power cores, etc.
Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.

PubMed

Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard

2015-02-01

Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.
Di-jet Hadron Correlations in Central Au+Au Collisions at √{sNN} = 200 GeV at STAR

NASA Astrophysics Data System (ADS)

Elsey, Nicholas; STAR Collaboration

2017-09-01

Jets and their modifications due to partonic energy loss provide a powerful tool to study the properties of the QGP created in ultrarelativistic heavy-ion collisions. For jets reconstructed with the anti-kT algorithm with resolution parameter R = 0.4 , previous measurements of the di-jet asymmetry AJ at STAR) indicate that the observed imbalance of an initial ``hard-core'' di-jet selection with pTconst > 2.0 GeV/c, pTlead > 20.0 GeV/c and pTsub > 10.0 GeV/c is restored to the balance of the pp reference when soft constituents are included. The lost energy recovered with soft constituents suggests soft gluon radiation by high pT partons. Jet-hadron correlations with respect to di-jets allow a differential assessment of the kinematic properties of the soft gluon radiation spectrum induced by partonic energy loss in the QGP. We present charged hadron correlations with respect to the di-jets found in the above AJ analysis, and compare to similar measurements using a jet trigger at RHIC.
Laboratory investigation of the erosion of cohesive sediments under oscillatory flows using a synchronized imaging technique

NASA Astrophysics Data System (ADS)

Sou, In Mei; Calantoni, Joseph; Reed, Allen; Furukawa, Yoko

2012-11-01

A synchronized dual stereo particle image velocimetry (PIV) measurement technique is used to examine the erosion process of a cohesive sediment core in the Small Oscillatory Flow Tunnel (S-OFT) in the Sediment Dynamics Laboratory at the Naval Research Laboratory, Stennis Space Center, MS. The dual stereo PIV windows were positioned on either side of a sediment core inserted along the centerline of the S-OFT allowing for a total measurement window of about 20 cm long by 10 cm high with sub-millimeter spacing on resolved velocity vectors. The period of oscillation ranged from 2.86 to 6.12 seconds with constant semi-excursion amplitude in the test section of 9 cm. During the erosion process, Kelvin-Helmholtz instabilities were observed as the flow accelerated in each direction and eventually were broken down when the flow reversed. The relative concentration of suspended sediments under different flow conditions was estimated using the intensity of light scattered from the sediment particles in suspension. By subtracting the initial light scattered from the core, the residual light intensity was assumed to be scattered from suspended sediments eroded from the core. Results from two different sediment core samples of mud and sand mixtures will be presented.
Cache Energy Optimization Techniques For Modern Processors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mittal, Sparsh

2013-01-01

Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In thismore » book, we present novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. Also, we present cache energy saving techniques for caches designed with both conventional SRAM devices and emerging non-volatile devices such as STT-RAM (spin-torque transfer RAM). We present software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with state-of-the-art techniques and have found that our techniques outperform them in terms of energy efficiency and other relevant metrics. The techniques presented in this book have important applications in improving energy-efficiency of higher-end embedded, desktop, QoS, real-time, server processors and multitasking systems. This book is intended to be a valuable guide for both newcomers and veterans in the field of cache power management. It will help graduate students, CAD tool developers and designers in understanding the need of energy efficiency in modern computing systems. Further, it will be useful for researchers in gaining insights into algorithms and techniques for micro-architectural and system-level energy optimization using dynamic cache reconfiguration. We sincerely believe that the ``food for thought'' presented in this book will inspire the readers to develop even better ideas for designing ``green'' processors of tomorrow.« less
Vacuum space charge effects in sub-picosecond soft X-ray photoemission on a molecular adsorbate layer

DOE PAGES

Dell'Angela, M.; Anniyev, T.; Beye, M.; ...

2015-03-01

Vacuum space charge-induced kinetic energy shifts of O 1s and Ru 3d core levels in femtosecond soft X-ray photoemission spectra (PES) have been studied at a free electron laser (FEL) for an oxygen layer on Ru(0001). We fully reproduced the measurements by simulating the in-vacuum expansion of the photoelectrons and demonstrate the space charge contribution of the high-order harmonics in the FEL beam. Employing the same analysis for 400 nm pump-X-ray probe PES, we can disentangle the delay dependent Ru 3d energy shifts into effects induced by space charge and by lattice heating from the femtosecond pump pulse.
Vacuum space charge effects in sub-picosecond soft X-ray photoemission on a molecular adsorbate layer.

PubMed

Dell'Angela, M; Anniyev, T; Beye, M; Coffee, R; Föhlisch, A; Gladh, J; Kaya, S; Katayama, T; Krupin, O; Nilsson, A; Nordlund, D; Schlotter, W F; Sellberg, J A; Sorgenfrei, F; Turner, J J; Öström, H; Ogasawara, H; Wolf, M; Wurth, W

2015-03-01

Vacuum space charge induced kinetic energy shifts of O 1s and Ru 3d core levels in femtosecond soft X-ray photoemission spectra (PES) have been studied at a free electron laser (FEL) for an oxygen layer on Ru(0001). We fully reproduced the measurements by simulating the in-vacuum expansion of the photoelectrons and demonstrate the space charge contribution of the high-order harmonics in the FEL beam. Employing the same analysis for 400 nm pump-X-ray probe PES, we can disentangle the delay dependent Ru 3d energy shifts into effects induced by space charge and by lattice heating from the femtosecond pump pulse.
Development of an extensible dual-core wireless sensing node for cyber-physical systems

NASA Astrophysics Data System (ADS)

Kane, Michael; Zhu, Dapeng; Hirose, Mitsuhito; Dong, Xinjun; Winter, Benjamin; Häckell, Mortiz; Lynch, Jerome P.; Wang, Yang; Swartz, A.

2014-04-01

The introduction of wireless telemetry into the design of monitoring and control systems has been shown to reduce system costs while simplifying installations. To date, wireless nodes proposed for sensing and actuation in cyberphysical systems have been designed using microcontrollers with one computational pipeline (i.e., single-core microcontrollers). While concurrent code execution can be implemented on single-core microcontrollers, concurrency is emulated by splitting the pipeline's resources to support multiple threads of code execution. For many applications, this approach to multi-threading is acceptable in terms of speed and function. However, some applications such as feedback controls demand deterministic timing of code execution and maximum computational throughput. For these applications, the adoption of multi-core processor architectures represents one effective solution. Multi-core microcontrollers have multiple computational pipelines that can execute embedded code in parallel and can be interrupted independent of one another. In this study, a new wireless platform named Martlet is introduced with a dual-core microcontroller adopted in its design. The dual-core microcontroller design allows Martlet to dedicate one core to standard wireless sensor operations while the other core is reserved for embedded data processing and real-time feedback control law execution. Another distinct feature of Martlet is a standardized hardware interface that allows specialized daughter boards (termed wing boards) to be interfaced to the Martlet baseboard. This extensibility opens opportunity to encapsulate specialized sensing and actuation functions in a wing board without altering the design of Martlet. In addition to describing the design of Martlet, a few example wings are detailed, along with experiments showing the Martlet's ability to monitor and control physical systems such as wind turbines and buildings.

Mechanism of supporting sub-communicator collectives with O(64) counters as opposed to one counter for each sub-communicator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, Sameer; Mamidala, Amith R.; Ratterman, Joseph D.

A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal tomore » the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.« less
Mechanism of supporting sub-communicator collectives with o(64) counters as opposed to one counter for each sub-communicator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael; Kumar, Sameer; Mamidala, Amith R.

A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a barrier algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal tomore » the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.« less
A Locality-Based Threading Algorithm for the Configuration-Interaction Method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shan, Hongzhang; Williams, Samuel; Johnson, Calvin

The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
A Locality-Based Threading Algorithm for the Configuration-Interaction Method

DOE PAGES

Shan, Hongzhang; Williams, Samuel; Johnson, Calvin; ...

2017-07-03

The Configuration Interaction (CI) method has been widely used to solve the non-relativistic many-body Schrodinger equation. One great challenge to implementing it efficiently on manycore architectures is its immense memory and data movement requirements. To address this issue, within each node, we exploit a hybrid MPI+OpenMP programming model in lieu of the traditional flat MPI programming model. Here in this paper, we develop optimizations that partition the workloads among OpenMP threads based on data locality,-which is essential in ensuring applications with complex data access patterns scale well on manycore architectures. The new algorithm scales to 256 threadson the 64-core Intelmore » Knights Landing (KNL) manycore processor and 24 threads on dual-socket Ivy Bridge (Xeon) nodes. Compared with the original implementation, the performance has been improved by up to 7× on theKnights Landing processor and 3× on the dual-socket Ivy Bridge node.« less
A fast parallel 3D Poisson solver with longitudinal periodic and transverse open boundary conditions for space-charge simulations

NASA Astrophysics Data System (ADS)

Qiang, Ji

2017-10-01

A three-dimensional (3D) Poisson solver with longitudinal periodic and transverse open boundary conditions can have important applications in beam physics of particle accelerators. In this paper, we present a fast efficient method to solve the Poisson equation using a spectral finite-difference method. This method uses a computational domain that contains the charged particle beam only and has a computational complexity of O(Nu(logNmode)) , where Nu is the total number of unknowns and Nmode is the maximum number of longitudinal or azimuthal modes. This saves both the computational time and the memory usage of using an artificial boundary condition in a large extended computational domain. The new 3D Poisson solver is parallelized using a message passing interface (MPI) on multi-processor computers and shows a reasonable parallel performance up to hundreds of processor cores.
Modeling of the ground-to-SSFMB link networking features using SPW

NASA Technical Reports Server (NTRS)

Watson, John C.

1993-01-01

This report describes the modeling and simulation of the networking features of the ground-to-Space Station Freedom manned base (SSFMB) link using COMDISCO signal processing work-system (SPW). The networking features modeled include the implementation of Consultative Committee for Space Data Systems (CCSDS) protocols in the multiplexing of digitized audio and core data into virtual channel data units (VCDU's) in the control center complex and the demultiplexing of VCDU's in the onboard baseband signal processor. The emphasis of this work has been placed on techniques for modeling the CCSDS networking features using SPW. The objectives for developing the SPW models are to test the suitability of SPW for modeling networking features and to develop SPW simulation models of the control center complex and space station baseband signal processor for use in end-to-end testing of the ground-to-SSFMB S-band single access forward (SSAF) link.
Multicore Challenges and Benefits for High Performance Scientific Computing

DOE PAGES

Nielsen, Ida M. B.; Janssen, Curtis L.

2008-01-01

Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexitymore » of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.« less
A pervasive parallel framework for visualization: final report for FWP 10-014707

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.

2014-01-01

We are on the threshold of a transformative change in the basic architecture of highperformance computing. The use of accelerator processors, characterized by large core counts, shared but asymmetrical memory, and heavy thread loading, is quickly becoming the norm in high performance computing. These accelerators represent significant challenges in updating our existing base of software. An intrinsic problem with this transition is a fundamental programming shift from message passing processes to much more fine thread scheduling with memory sharing. Another problem is the lack of stability in accelerator implementation; processor and compiler technology is currently changing rapidly. This report documentsmore » the results of our three-year ASCR project to address these challenges. Our project includes the development of the Dax toolkit, which contains the beginnings of new algorithms for a new generation of computers and the underlying infrastructure to rapidly prototype and build further algorithms as necessary.« less
Parallelization of the preconditioned IDR solver for modern multicore computer systems

NASA Astrophysics Data System (ADS)

Bessonov, O. A.; Fedoseyev, A. I.

2012-10-01

This paper present the analysis, parallelization and optimization approach for the large sparse matrix solver CNSPACK for modern multicore microprocessors. CNSPACK is an advanced solver successfully used for coupled solution of stiff problems arising in multiphysics applications such as CFD, semiconductor transport, kinetic and quantum problems. It employs iterative IDR algorithm with ILU preconditioning (user chosen ILU preconditioning order). CNSPACK has been successfully used during last decade for solving problems in several application areas, including fluid dynamics and semiconductor device simulation. However, there was a dramatic change in processor architectures and computer system organization in recent years. Due to this, performance criteria and methods have been revisited, together with involving the parallelization of the solver and preconditioner using Open MP environment. Results of the successful implementation for efficient parallelization are presented for the most advances computer system (Intel Core i7-9xx or two-processor Xeon 55xx/56xx).
Mechanism of supporting sub-communicator collectives with O(64) counters as opposed to one counter for each sub-communicator

DOEpatents

Kumar, Sameer; Mamidala, Amith R.; Ratterman, Joseph D.; Blocksome, Michael; Miller, Douglas

2013-09-03

A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.
Broadband extreme ultraviolet probing of transient gratings in vanadium dioxide

DOE PAGES

Sistrunk, Emily; Grilj, Jakob; Jeong, Jaewoo; ...

2015-02-11

Nonlinear spectroscopy in the extreme ultraviolet (EUV) and soft x-ray spectral range offers the opportunity for element selective probing of ultrafast dynamics using core-valence transitions (Mukamel et al., Acc. Chem. Res. 42, 553 (2009)). The study demonstrate a step on this path showing core-valence sensitivity in transient grating spectroscopy with EUV probing. We study the optically induced insulator-to-metal transition (IMT) of a VO 2 film with EUV diffraction from the optically excited sample. The VO 2 exhibits a change in the 3p-3d resonance of V accompanied by an acoustic response. Due to the broadband probing we are able to separatemore » the two features.« less
Preparation of microcapsules with self-microemulsifying core by a vibrating nozzle method.

PubMed

Homar, Miha; Suligoj, Dasa; Gasperlin, Mirjana

2007-02-01

Incorporation of drugs in self-microemulsifying systems (SMES) offers several advantages for their delivery, the main one being faster drug dissolution and absorption. Formulation of SMES in solid dosage forms can be difficult and, to date, most SMES are applied in liquid dosage form or soft gelatin capsules. This study has explored the incorporation of SMES in microcapsules, which could then be used for formulation of solid dosage forms. An Inotech IE-50 R encapsulator equipped with a concentric nozzle was used to produce alginate microcapsules with a self-microemulsifying core. Retention of the core phase was improved by optimization of encapsulator parameters and modification of the shell forming phase and hardening solution. The mean encapsulation efficiency of final batches was more than 87%, which resulted in 0.07% drug loading. It was demonstrated that production of microcapsules with a self-microemulsifying core is possible and that the process is stable and reproducible.
Optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme for Intel Many Integrated Core (MIC) architecture

NASA Astrophysics Data System (ADS)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.

2015-05-01

Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.
Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Doerfler, Douglas; Austin, Brian; Cook, Brandon

There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL,more » such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.« less
Soft X-ray synchrotron radiation spectroscopy study of molecule-based nanoparticles

NASA Astrophysics Data System (ADS)

Lee, Eunsook; Kim, D. H.; Kang, J.-S.; Kim, Kyung Hyun; Kim, Pil; Baik, Jaeyoon; Shin, H. J.

2014-11-01

The electronic structures of molecule-based nanoparticles, such as biomineralized Helicobacter pylori ferritin (Hpf), Heme, and RbCo[Fe(CN)6]H2O (RbCoFe) Prussian blue analogue, have been investigated by employing photoemission spectroscopy and soft X-ray absorption spectroscopy. Fe ions are found to be nearly trivalent in Hpf and Heme nanoparticles, which provides evidence that the amount of magnetite (Fe3O4) should be negligible in the Hpf core and that the biomineralization of Fe oxides in the high-Fe-bound-state Hpf core arises from a hematite-like formation. On the other hand, Fe ions are nearly divalent and Co ions are Co2+-Co3+ mixed-valent in RbCoFe. Therefore this finding suggests that the mechanism of the photo-induced transition in RbCoFe Prussian blue analogue is not a simple spin-state transition of Fe2+-Co3+ → Fe3+-Co2+. It is likely that Co2+ ions have the high-spin configuration while Fe2+ ions have the low-spin configuration.
The comparison of numerical models of a sandwich panel in the context of the core deformations at the supports

NASA Astrophysics Data System (ADS)

Pozorska, Jolanta; Pozorski, Zbigniew

2018-01-01

The paper presents the problem of static structural behavior of sandwich panels at the supports. The panels have a soft core and correspond to typical structures applied in civil engineering. To analyze the problem, five different 3-D numerical models were created. The results were compared in the context of core compression and stress redistribution. The numerical solutions verify methods of evaluating the capacity of the sandwich panel that are known from the literature.
Gaseous electron multiplier-based soft x-ray plasma diagnostics development: Preliminary tests at ASDEX Upgrade.

PubMed

Chernyshova, M; Malinowski, K; Czarski, T; Wojeński, A; Vezinet, D; Poźniak, K T; Kasprowicz, G; Mazon, D; Jardin, A; Herrmann, A; Kowalska-Strzęciwilk, E; Krawczyk, R; Kolasiński, P; Zabołotny, W; Zienkiewicz, P

2016-11-01

A Gaseous Electron Multiplier (GEM)-based detector is being developed for soft X-ray diagnostics on tokamaks. Its main goal is to facilitate transport studies of impurities like tungsten. Such studies are very relevant to ITER, where the excessive accumulation of impurities in the plasma core should be avoided. This contribution provides details of the preliminary tests at ASDEX Upgrade (AUG) with a focus on the most important aspects for detector operation in harsh radiation environment. It was shown that both spatially and spectrally resolved data could be collected, in a reasonable agreement with other AUG diagnostics. Contributions to the GEM signal include also hard X-rays, gammas, and neutrons. First simulations of the effect of high-energy photons have helped understanding these contributions.
Gaseous electron multiplier-based soft x-ray plasma diagnostics development: Preliminary tests at ASDEX Upgrade

NASA Astrophysics Data System (ADS)

Chernyshova, M.; Malinowski, K.; Czarski, T.; Wojeński, A.; Vezinet, D.; Poźniak, K. T.; Kasprowicz, G.; Mazon, D.; Jardin, A.; Herrmann, A.; Kowalska-Strzeciwilk, E.; Krawczyk, R.; Kolasiński, P.; Zabołotny, W.; Zienkiewicz, P.

2016-11-01

A Gaseous Electron Multiplier (GEM)-based detector is being developed for soft X-ray diagnostics on tokamaks. Its main goal is to facilitate transport studies of impurities like tungsten. Such studies are very relevant to ITER, where the excessive accumulation of impurities in the plasma core should be avoided. This contribution provides details of the preliminary tests at ASDEX Upgrade (AUG) with a focus on the most important aspects for detector operation in harsh radiation environment. It was shown that both spatially and spectrally resolved data could be collected, in a reasonable agreement with other AUG diagnostics. Contributions to the GEM signal include also hard X-rays, gammas, and neutrons. First simulations of the effect of high-energy photons have helped understanding these contributions.
Optimization of programming parameters in children with the advanced bionics cochlear implant.

PubMed

Baudhuin, Jacquelyn; Cadieux, Jamie; Firszt, Jill B; Reeder, Ruth M; Maxson, Jerrica L

2012-05-01

Cochlear implants provide access to soft intensity sounds and therefore improved audibility for children with severe-to-profound hearing loss. Speech processor programming parameters, such as threshold (or T-level), input dynamic range (IDR), and microphone sensitivity, contribute to the recipient's program and influence audibility. When soundfield thresholds obtained through the speech processor are elevated, programming parameters can be modified to improve soft sound detection. Adult recipients show improved detection for low-level sounds when T-levels are set at raised levels and show better speech understanding in quiet when wider IDRs are used. Little is known about the effects of parameter settings on detection and speech recognition in children using today's cochlear implant technology. The overall study aim was to assess optimal T-level, IDR, and sensitivity settings in pediatric recipients of the Advanced Bionics cochlear implant. Two experiments were conducted. Experiment 1 examined the effects of two T-level settings on soundfield thresholds and detection of the Ling 6 sounds. One program set T-levels at 10% of most comfortable levels (M-levels) and another at 10 current units (CUs) below the level judged as "soft." Experiment 2 examined the effects of IDR and sensitivity settings on speech recognition in quiet and noise. Participants were 11 children 7-17 yr of age (mean 11.3) implanted with the Advanced Bionics High Resolution 90K or CII cochlear implant system who had speech recognition scores of 20% or greater on a monosyllabic word test. Two T-level programs were compared for detection of the Ling sounds and frequency modulated (FM) tones. Differing IDR/sensitivity programs (50/0, 50/10, 70/0, 70/10) were compared using Ling and FM tone detection thresholds, CNC (consonant-vowel nucleus-consonant) words at 50 dB SPL, and Hearing in Noise Test for Children (HINT-C) sentences at 65 dB SPL in the presence of four-talker babble (+8 signal-to-noise ratio). Outcomes were analyzed using a paired t-test and a mixed-model repeated measures analysis of variance (ANOVA). T-levels set 10 CUs below "soft" resulted in significantly lower detection thresholds for all six Ling sounds and FM tones at 250, 1000, 3000, 4000, and 6000 Hz. When comparing programs differing by IDR and sensitivity, a 50 dB IDR with a 0 sensitivity setting showed significantly poorer thresholds for low frequency FM tones and voiced Ling sounds. Analysis of group mean scores for CNC words in quiet or HINT-C sentences in noise indicated no significant differences across IDR/sensitivity settings. Individual data, however, showed significant differences between IDR/sensitivity programs in noise; the optimal program differed across participants. In pediatric recipients of the Advanced Bionics cochlear implant device, manually setting T-levels with ascending loudness judgments should be considered when possible or when low-level sounds are inaudible. Study findings confirm the need to determine program settings on an individual basis as well as the importance of speech recognition verification measures in both quiet and noise. Clinical guidelines are suggested for selection of programming parameters in both young and older children. American Academy of Audiology.
Soft Robotics.

PubMed

Whitesides, George M

2018-04-09

This description of "soft robotics" is not intended to be a conventional review, in the sense of a comprehensive technical summary of a developing field. Rather, its objective is to describe soft robotics as a new field-one that offers opportunities to chemists and materials scientists who like to make "things" and to work with macroscopic objects that move and exert force. It will give one (personal) view of what soft actuators and robots are, and how this class of soft devices fits into the more highly developed field of conventional "hard" robotics. It will also suggest how and why soft robotics is more than simply a minor technical "tweak" on hard robotics and propose a unique role for chemistry, and materials science, in this field. Soft robotics is, at its core, intellectually and technologically different from hard robotics, both because it has different objectives and uses and because it relies on the properties of materials to assume many of the roles played by sensors, actuators, and controllers in hard robotics. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

A distributed agent architecture for real-time knowledge-based systems: Real-time expert systems project, phase 1

NASA Technical Reports Server (NTRS)

Lee, S. Daniel

1990-01-01

We propose a distributed agent architecture (DAA) that can support a variety of paradigms based on both traditional real-time computing and artificial intelligence. DAA consists of distributed agents that are classified into two categories: reactive and cognitive. Reactive agents can be implemented directly in Ada to meet hard real-time requirements and be deployed on on-board embedded processors. A traditional real-time computing methodology under consideration is the rate monotonic theory that can guarantee schedulability based on analytical methods. AI techniques under consideration for reactive agents are approximate or anytime reasoning that can be implemented using Bayesian belief networks as in Guardian. Cognitive agents are traditional expert systems that can be implemented in ART-Ada to meet soft real-time requirements. During the initial design of cognitive agents, it is critical to consider the migration path that would allow initial deployment on ground-based workstations with eventual deployment on on-board processors. ART-Ada technology enables this migration while Lisp-based technologies make it difficult if not impossible. In addition to reactive and cognitive agents, a meta-level agent would be needed to coordinate multiple agents and to provide meta-level control.
The CSM testbed matrix processors internal logic and dataflow descriptions

NASA Technical Reports Server (NTRS)

Regelbrugge, Marc E.; Wright, Mary A.

1988-01-01

This report constitutes the final report for subtask 1 of Task 5 of NASA Contract NAS1-18444, Computational Structural Mechanics (CSM) Research. This report contains a detailed description of the coded workings of selected CSM Testbed matrix processors (i.e., TOPO, K, INV, SSOL) and of the arithmetic utility processor AUS. These processors and the current sparse matrix data structures are studied and documented. Items examined include: details of the data structures, interdependence of data structures, data-blocking logic in the data structures, processor data flow and architecture, and processor algorithmic logic flow.
Tunable magnetic vortex resonance in a potential well

NASA Astrophysics Data System (ADS)

Warnicke, P.; Wohlhüter, P.; Suszka, A. K.; Stevenson, S. E.; Heyderman, L. J.; Raabe, J.

2017-11-01

We use frequency-resolved x-ray microscopy to fully characterize the potential well of a magnetic vortex in a soft ferromagnetic permalloy square. The vortex core is excited with magnetic broadband pulses and simultaneously displaced with a static magnetic field. We observe a frequency increase (blueshift) in the gyrotropic mode of the vortex core with increasing bias field. Supported by micromagnetic simulations, we show that this frequency increase is accompanied by internal deformation of the vortex core. The ability to modify the inner structure of the vortex core provides a mechanism to control the dynamics of magnetic vortices.
Interfacial effect on physical properties of composite media: Interfacial volume fraction with non-spherical hard-core-soft-shell-structured particles.

PubMed

Xu, Wenxiang; Duan, Qinglin; Ma, Huaifa; Chen, Wen; Chen, Huisu

2015-11-02

Interfaces are known to be crucial in a variety of fields and the interfacial volume fraction dramatically affects physical properties of composite media. However, it is an open problem with great significance how to determine the interfacial property in composite media with inclusions of complex geometry. By the stereological theory and the nearest-surface distribution functions, we first propose a theoretical framework to symmetrically present the interfacial volume fraction. In order to verify the interesting generalization, we simulate three-phase composite media by employing hard-core-soft-shell structures composed of hard mono-/polydisperse non-spherical particles, soft interfaces, and matrix. We numerically derive the interfacial volume fraction by a Monte Carlo integration scheme. With the theoretical and numerical results, we find that the interfacial volume fraction is strongly dependent on the so-called geometric size factor and sphericity characterizing the geometric shape in spite of anisotropic particle types. As a significant interfacial property, the present theoretical contribution can be further drawn into predicting the effective transport properties of composite materials.
Interfacial effect on physical properties of composite media: Interfacial volume fraction with non-spherical hard-core-soft-shell-structured particles

PubMed Central

Xu, Wenxiang; Duan, Qinglin; Ma, Huaifa; Chen, Wen; Chen, Huisu

2015-01-01

Interfaces are known to be crucial in a variety of fields and the interfacial volume fraction dramatically affects physical properties of composite media. However, it is an open problem with great significance how to determine the interfacial property in composite media with inclusions of complex geometry. By the stereological theory and the nearest-surface distribution functions, we first propose a theoretical framework to symmetrically present the interfacial volume fraction. In order to verify the interesting generalization, we simulate three-phase composite media by employing hard-core-soft-shell structures composed of hard mono-/polydisperse non-spherical particles, soft interfaces, and matrix. We numerically derive the interfacial volume fraction by a Monte Carlo integration scheme. With the theoretical and numerical results, we find that the interfacial volume fraction is strongly dependent on the so-called geometric size factor and sphericity characterizing the geometric shape in spite of anisotropic particle types. As a significant interfacial property, the present theoretical contribution can be further drawn into predicting the effective transport properties of composite materials. PMID:26522701
Design and implementation of projects with Xilinx Zynq FPGA: a practical case

NASA Astrophysics Data System (ADS)

Travaglini, R.; D'Antone, I.; Meneghini, S.; Rignanese, L.; Zuffa, M.

The main advantage when using FPGAs with embedded processors is the availability of additional several high-performance resources in the same physical device. Moreover, the FPGA programmability allows for connect custom peripherals. Xilinx have designed a programmable device named Zynq-7000 (simply called Zynq in the following), which integrates programmable logic (identical to the other Xilinx "serie 7" devices) with a System on Chip (SOC) based on two embedded ARM processors. Since both parts are deeply connected, the designers benefit from performance of hardware SOC and flexibility of programmability as well. In this paper a design developed by the Electronic Design Department at the Bologna Division of INFN will be presented as a practical case of project based on Zynq device. It is developed by using a commercial board called ZedBoard hosting a FMC mezzanine with a 12-bit 500 MS/s ADC. The Zynq FPGA on the ZedBoard receives digital outputs from the ADC and send them to the acquisition PC, after proper formatting, through a Gigabit Ethernet link. The major focus of the paper will be about the methodology to develop a Zynq-based design with the Xilinx Vivado software, enlightening how to configure the SOC and connect it with the programmable logic. Firmware design techniques will be presented: in particular both VHDL and IP core based strategies will be discussed. Further, the procedure to develop software for the embedded processor will be presented. Finally, some debugging tools, like the embedded Logic Analyzer, will be shown. Advantages and disadvantages with respect to adopting FPGA without embedded processors will be discussed.
Synthesis of Trimagnetic Multishell MnFe2 O4 @CoFe2 O4 @NiFe2 O4 Nanoparticles.

PubMed

Gavrilov-Isaac, Véronica; Neveu, Sophie; Dupuis, Vincent; Taverna, Dario; Gloter, Alexandre; Cabuil, Valérie

2015-06-10

The synthesis and characterization of original ferrite multishell magnetic nanoparticles made of a soft core (manganese ferrite) covered with two successive shells, a hard one (cobalt ferrite) and then a soft one (nickel ferrite), are described. The results demonstrate the modulation of the coercivity when new magnetic shells are added. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Energy-efficient fault tolerance in multiprocessor real-time systems

NASA Astrophysics Data System (ADS)

Guo, Yifeng

The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is investigated, where tasks' main copies are executed ASAP while backup copies ALAP to reduce the overlapped execution of main and backup copies of the same task and thus reduce energy consumption. All proposed techniques are evaluated through extensive simulations and compared with other state-of-the-art approaches. The simulation results confirm that the proposed schemes can preserve the system reliability while still achieving substantial energy savings. Finally, for both SS and POED based Energy-Efficient Fault-Tolerant (EEFT) schemes, a series of recovery strategies are designed when more than one (transient and permanent) faults need to be tolerated.
Soft template synthesis of yolk/silica shell particles.

PubMed

Wu, Xue-Jun; Xu, Dongsheng

2010-04-06

Yolk/shell particles possess a unique structure that is composed of hollow shells that encapsulate other particles but with an interstitial space between them. These structures are different from core/shell particles in that the core particles are freely movable in the shell. Yolk/shell particles combine the properties of each component, and can find potential applications in catalysis, lithium ion batteries, and biosensors. In this Research News article, a soft-template-assisted method for the preparation of yolk/silica shell particles is presented. The demonstrated method is simple and general, and can produce hollow silica spheres incorporated with different particles independent of their diameters, geometry, and composition. Furthermore, yolk/mesoporous silica shell particles and multishelled particles are also prepared through optimization of the experimental conditions. Finally, potential applications of these particles are discussed.
EIT Crinkles as Evidence for the Breakout Model of Solar Eruptions

NASA Technical Reports Server (NTRS)

Sterling, Alphonse C.; Moore, R. L.; Whitaker, Ann F. (Technical Monitor)

2001-01-01

We present observations of two homologous flares in NOAA active region 8210 occurring on 1998 May 1 and May 2, using EUV data from the Extreme Ultraviolet Radiation Imaging Telescope (EIT) on the Solar and Heliospheric Observatory (SOHO), high-resolution and high-time cadence images from the soft X-ray telescope (SXT) on Yohkoh, images or fluxes from the hard X-ray telescope (HXT) on Yohkoh and the BATSE experiment on the Compton Gamma Ray Observatory (CGRO), and Ca xix soft X-ray spectra from the Bragg crystal spectrometer (BCS) on Yohkoh. Magnetograms indicate that the flares occurred in a complex magnetic topology, consisting of an emerging flux region (EFR) sandwiched between a sunspot to the west and a coronal hole to the east. In an earlier study we found that in EIT images, both flaring episodes showed the formation of a crinkle-like pattern of emission ("EIT crinkles") occurring in the coronal hole vicinity, well away from a central "core field" area near the EFR-sunspot boundary. With our expanded data set, here we find that most of the energetic activity occurs in the core region in both events, with some portions of the core brightening shortly after the onset of the EIT crinkles, and other regions of the core brightening several minutes later, coincident with a burst of hard X-rays: there are no obvious core brightenings prior to the onset of the EIT crinkles. These timings are consistent with the "breakout model" of solar eruptions, whereby the emerging flux is initially constrained by a system of overlying magnetic field lines, and is able to erupt only after an opening develops in the overlying fields as a consequence of magnetic reconnection at a magnetic null point. In our case, the EIT crinkles would be a signature of this pre-impulsive-phase magnetic reconnection, and brightening of the core only occurs after the core fields begin to escape through the newly-created opening in the overlying fields. Morphology in soft X-ray images and properties in hard X-rays differ between the two events, with complexities that preclude a simple determination of the dynamics in the core at the times of eruption. From the BCS spectra, we find that the core region expends energy at a rate of approx. 10(exp 26) erg per second during the time of the growth of the EIT crinkles; this rate is an upper limit to energy expended in the reconnections opening the overlying fields. Energy losses occur at an order-of-magnitude higher rate near the time of the peak of the events. There is little evidence of asymmetry in the spectra, consistent with the majority of the mass flows occurring normal to the line-of-sight. Both events have similar electron temperature dependencies on time.
High performance 3D adaptive filtering for DSP based portable medical imaging systems

NASA Astrophysics Data System (ADS)

Bockenbach, Olivier; Ali, Murtaza; Wainwright, Ian; Nadeski, Mark

2015-03-01

Portable medical imaging devices have proven valuable for emergency medical services both in the field and hospital environments and are becoming more prevalent in clinical settings where the use of larger imaging machines is impractical. Despite their constraints on power, size and cost, portable imaging devices must still deliver high quality images. 3D adaptive filtering is one of the most advanced techniques aimed at noise reduction and feature enhancement, but is computationally very demanding and hence often cannot be run with sufficient performance on a portable platform. In recent years, advanced multicore digital signal processors (DSP) have been developed that attain high processing performance while maintaining low levels of power dissipation. These processors enable the implementation of complex algorithms on a portable platform. In this study, the performance of a 3D adaptive filtering algorithm on a DSP is investigated. The performance is assessed by filtering a volume of size 512x256x128 voxels sampled at a pace of 10 MVoxels/sec with an Ultrasound 3D probe. Relative performance and power is addressed between a reference PC (Quad Core CPU) and a TMS320C6678 DSP from Texas Instruments.
Evaluation of peristaltic micromixers for highly integrated microfluidic systems

PubMed Central

Kim, Duckjong; Rho, Hoon Suk; Jambovane, Sachin; Shin, Soojeong; Hong, Jong Wook

2016-01-01

Microfluidic devices based on the multilayer soft lithography allow accurate manipulation of liquids, handling reagents at the sub-nanoliter level, and performing multiple reactions in parallel processors by adapting micromixers. Here, we have experimentally evaluated and compared several designs of micromixers and operating conditions to find design guidelines for the micromixers. We tested circular, triangular, and rectangular mixing loops and measured mixing performance according to the position and the width of the valves that drive nanoliters of fluids in the micrometer scale mixing loop. We found that the rectangular mixer is best for the applications of highly integrated microfluidic platforms in terms of the mixing performance and the space utilization. This study provides an improved understanding of the flow behaviors inside micromixers and design guidelines for micromixers that are critical to build higher order fluidic systems for the complicated parallel bio/chemical processes on a chip. PMID:27036809
Evaluation of peristaltic micromixers for highly integrated microfluidic systems

NASA Astrophysics Data System (ADS)

Kim, Duckjong; Rho, Hoon Suk; Jambovane, Sachin; Shin, Soojeong; Hong, Jong Wook

2016-03-01

Microfluidic devices based on the multilayer soft lithography allow accurate manipulation of liquids, handling reagents at the sub-nanoliter level, and performing multiple reactions in parallel processors by adapting micromixers. Here, we have experimentally evaluated and compared several designs of micromixers and operating conditions to find design guidelines for the micromixers. We tested circular, triangular, and rectangular mixing loops and measured mixing performance according to the position and the width of the valves that drive nanoliters of fluids in the micrometer scale mixing loop. We found that the rectangular mixer is best for the applications of highly integrated microfluidic platforms in terms of the mixing performance and the space utilization. This study provides an improved understanding of the flow behaviors inside micromixers and design guidelines for micromixers that are critical to build higher order fluidic systems for the complicated parallel bio/chemical processes on a chip.
Composite nanoplatelets combining soft-magnetic iron oxide with hard-magnetic barium hexaferrite

NASA Astrophysics Data System (ADS)

Primc, D.; Makovec, D.

2015-01-01

By coupling two different magnetic materials inside a single composite nanoparticle, the shape of the magnetic hysteresis can be engineered to meet the requirements of specific applications. Sandwich-like composite nanoparticles composed of a hard-magnetic Ba-hexaferrite (BaFe12O19) platelet core in between two soft-magnetic spinel iron oxide maghemite (γ-Fe2O3) layers were synthesized using a new, simple and inexpensive method based on the co-precipitation of Fe3+/Fe2+ ions in an aqueous suspension of hexaferrite core nanoparticles. The required close control of the supersaturation of the precipitating species was enabled by the controlled release of the Fe3+ ions from the nitrate complex with urea ([Fe((H2N)2C&z.dbd;O)6](NO3)3) and by using Mg(OH)2 as a solid precipitating agent. The platelet Ba-hexaferrite nanoparticles of different sizes were used as the cores. The controlled coating resulted in an exclusively heterogeneous nucleation and the topotactic growth of the spinel layers on both basal surfaces of the larger hexaferrite nanoplatelets. The direct magnetic coupling between the core and the shell resulted in a strong increase of the energy product |BH|max. Ultrafine core nanoparticles reacted with the precipitating species and homogeneous product nanoparticles were formed, which differ in terms of the structure and composition compared to any other compound in the BaO-Fe2O3 system.By coupling two different magnetic materials inside a single composite nanoparticle, the shape of the magnetic hysteresis can be engineered to meet the requirements of specific applications. Sandwich-like composite nanoparticles composed of a hard-magnetic Ba-hexaferrite (BaFe12O19) platelet core in between two soft-magnetic spinel iron oxide maghemite (γ-Fe2O3) layers were synthesized using a new, simple and inexpensive method based on the co-precipitation of Fe3+/Fe2+ ions in an aqueous suspension of hexaferrite core nanoparticles. The required close control of the supersaturation of the precipitating species was enabled by the controlled release of the Fe3+ ions from the nitrate complex with urea ([Fe((H2N)2C&z.dbd;O)6](NO3)3) and by using Mg(OH)2 as a solid precipitating agent. The platelet Ba-hexaferrite nanoparticles of different sizes were used as the cores. The controlled coating resulted in an exclusively heterogeneous nucleation and the topotactic growth of the spinel layers on both basal surfaces of the larger hexaferrite nanoplatelets. The direct magnetic coupling between the core and the shell resulted in a strong increase of the energy product |BH|max. Ultrafine core nanoparticles reacted with the precipitating species and homogeneous product nanoparticles were formed, which differ in terms of the structure and composition compared to any other compound in the BaO-Fe2O3 system. Electronic supplementary information (ESI) available: Synthesis (ESI #1) and properties (ESI #2) of the barium hexaferrite core nanoparticles, TEM of the nanoparticles synthesized under an excessive supersaturation (ESI #3), and magnetic properties of physical mixtures of the hard-magnetic hexaferrite and the soft-magnetic spinel ferrite (ESI #4). See DOI: 10.1039/c4nr05854b
A programmable computational image sensor for high-speed vision

NASA Astrophysics Data System (ADS)

Yang, Jie; Shi, Cong; Long, Xitian; Wu, Nanjian

2013-08-01

In this paper we present a programmable computational image sensor for high-speed vision. This computational image sensor contains four main blocks: an image pixel array, a massively parallel processing element (PE) array, a row processor (RP) array and a RISC core. The pixel-parallel PE is responsible for transferring, storing and processing image raw data in a SIMD fashion with its own programming language. The RPs are one dimensional array of simplified RISC cores, it can carry out complex arithmetic and logic operations. The PE array and RP array can finish great amount of computation with few instruction cycles and therefore satisfy the low- and middle-level high-speed image processing requirement. The RISC core controls the whole system operation and finishes some high-level image processing algorithms. We utilize a simplified AHB bus as the system bus to connect our major components. Programming language and corresponding tool chain for this computational image sensor are also developed.
Bitstream decoding processor for fast entropy decoding of variable length coding-based multiformat videos

NASA Astrophysics Data System (ADS)

Jo, Hyunho; Sim, Donggyu

2014-06-01

We present a bitstream decoding processor for entropy decoding of variable length coding-based multiformat videos. Since most of the computational complexity of entropy decoders comes from bitstream accesses and table look-up process, the developed bitstream processing unit (BsPU) has several designated instructions to access bitstreams and to minimize branch operations in the table look-up process. In addition, the instruction for bitstream access has the capability to remove emulation prevention bytes (EPBs) of H.264/AVC without initial delay, repeated memory accesses, and additional buffer. Experimental results show that the proposed method for EPB removal achieves a speed-up of 1.23 times compared to the conventional EPB removal method. In addition, the BsPU achieves speed-ups of 5.6 and 3.5 times in entropy decoding of H.264/AVC and MPEG-4 Visual bitstreams, respectively, compared to an existing processor without designated instructions and a new table mapping algorithm. The BsPU is implemented on a Xilinx Virtex5 LX330 field-programmable gate array. The MPEG-4 Visual (ASP, Level 5) and H.264/AVC (Main Profile, Level 4) are processed using the developed BsPU with a core clock speed of under 250 MHz in real time.
QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

PubMed Central

Gudyś, Adam; Deorowicz, Sebastian

2014-01-01

Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors. PMID:24586435
Design of the Protocol Processor for the ROBUS-2 Communication System

NASA Technical Reports Server (NTRS)

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.

2005-01-01

The ROBUS-2 Protocol Processor (RPP) is a custom-designed hardware component implementing the functionality of the ROBUS-2 fault-tolerant communication system. The Reliable Optical Bus (ROBUS) is the core communication system of the Scalable Processor-Independent Design for Enhanced Reliability (SPIDER), a general-purpose fault tolerant integrated modular architecture currently under development at NASA Langley Research Center. ROBUS is a time-division multiple access (TDMA) broadcast communication system with medium access control by means of time-indexed communication schedule. ROBUS-2 is a developmental version of the ROBUS providing guaranteed fault-tolerant services to the attached processing elements (PEs), in the presence of a bounded number of faults. These services include message broadcast (Byzantine Agreement), dynamic communication schedule update, time reference (clock synchronization), and distributed diagnosis (group membership). ROBUS also features fault-tolerant startup and restart capabilities. ROBUS-2 tolerates internal as well as PE faults, and incorporates a dynamic self-reconfiguration capability driven by the internal diagnostic system. ROBUS consists of RPPs connected to each other by a lower-level physical communication network. The RPP has a pipelined architecture and the design is parameterized in the behavioral and structural domains. The design of the RPP enables the bus to achieve a PE-message throughput that approaches the available bandwidth at the physical layer.
Spaceborne Processor Array

NASA Technical Reports Server (NTRS)

Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

2008-01-01

A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.
SPECIAL ISSUE ON OPTICAL PROCESSING OF INFORMATION: Optoelectronic processors with scanning CCD photodetectors

NASA Astrophysics Data System (ADS)

Esepkina, N. A.; Lavrov, A. P.; Anan'ev, M. N.; Blagodarnyi, V. S.; Ivanov, S. I.; Mansyrev, M. I.; Molodyakov, S. A.

1995-10-01

Two new types of optoelectronic radio-signal processors were investigated. Charge-coupled device (CCD) photodetectors are used in these processors under continuous scanning conditions, i.e. in a time delay and storage mode. One of these processors is based on a CCD photodetector array with a reference-signal amplitude transparency and the other is an adaptive acousto-optical signal processor with linear frequency modulation. The processor with the transparency performs multichannel discrete—analogue convolution of an input signal with a corresponding kernel of the transformation determined by the transparency. If a light source is an array of light-emitting diodes of special (stripe) geometry, the optical stages of the processor can be made from optical fibre components and the whole processor then becomes a rigid 'sandwich' (a compact hybrid optoelectronic microcircuit). A report is given also of a study of a prototype processor with optical fibre components for the reception of signals from a system with antenna aperture synthesis, which forms a radio image of the Earth.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Batista, Antonio J. N.; Santos, Bruno; Fernandes, Ana

The data acquisition and control instrumentation cubicles room of the ITER tokamak will be irradiated with neutrons during the fusion reactor operation. A Virtex-6 FPGA from Xilinx (XC6VLX365T-1FFG1156C) is used on the ATCA-IO-PROCESSOR board, included in the ITER Catalog of I and C products - Fast Controllers. The Virtex-6 is a re-programmable logic device where the configuration is stored in Static RAM (SRAM), functional data stored in dedicated Block RAM (BRAM) and functional state logic in Flip-Flops. Single Event Upsets (SEU) due to the ionizing radiation of neutrons causes soft errors, unintended changes (bit-flips) to the values stored in statemore » elements of the FPGA. The SEU monitoring and soft errors repairing, when possible, were explored in this work. An FPGA built-in Soft Error Mitigation (SEM) controller detects and corrects soft errors in the FPGA configuration memory. Novel SEU sensors with Error Correction Code (ECC) detect and repair the BRAM memories. Proper management of SEU can increase reliability and availability of control instrumentation hardware for nuclear applications. The results of the tests performed using the SEM controller and the BRAM SEU sensors are presented for a Virtex-6 FPGA (XC6VLX240T-1FFG1156C) when irradiated with neutrons from the Portuguese Research Reactor (RPI), a 1 MW nuclear fission reactor operated by IST in the neighborhood of Lisbon. Results show that the proposed SEU mitigation technique is able to repair the majority of the detected SEU errors in the configuration and BRAM memories. (authors)« less
Time-efficient simulations of tight-binding electronic structures with Intel Xeon PhiTM many-core processors

NASA Astrophysics Data System (ADS)

Ryu, Hoon; Jeong, Yosang; Kang, Ji-Hoon; Cho, Kyu Nam

2016-12-01

Modelling of multi-million atomic semiconductor structures is important as it not only predicts properties of physically realizable novel materials, but can accelerate advanced device designs. This work elaborates a new Technology-Computer-Aided-Design (TCAD) tool for nanoelectronics modelling, which uses a sp3d5s∗ tight-binding approach to describe multi-million atomic structures, and simulate electronic structures with high performance computing (HPC), including atomic effects such as alloy and dopant disorders. Being named as Quantum simulation tool for Advanced Nanoscale Devices (Q-AND), the tool shows nice scalability on traditional multi-core HPC clusters implying the strong capability of large-scale electronic structure simulations, particularly with remarkable performance enhancement on latest clusters of Intel Xeon PhiTM coprocessors. A review of the recent modelling study conducted to understand an experimental work of highly phosphorus-doped silicon nanowires, is presented to demonstrate the utility of Q-AND. Having been developed via Intel Parallel Computing Center project, Q-AND will be open to public to establish a sound framework of nanoelectronics modelling with advanced HPC clusters of a many-core base. With details of the development methodology and exemplary study of dopant electronics, this work will present a practical guideline for TCAD development to researchers in the field of computational nanoelectronics.
Canadian Society for Exercise Physiology position stand: The use of instability to train the core in athletic and nonathletic conditioning.

PubMed

Behm, David G; Drinkwater, Eric J; Willardson, Jeffrey M; Cowley, Patrick M

2010-02-01

The use of instability devices and exercises to train the core musculature is an essential feature of many training centres and programs. It was the intent of this position stand to provide recommendations regarding the role of instability in resistance training programs designed to train the core musculature. The core is defined as the axial skeleton and all soft tissues with a proximal attachment originating on the axial skeleton, regardless of whether the soft tissue terminates on the axial or appendicular skeleton. Core stability can be achieved with a combination of muscle activation and intra-abdominal pressure. Abdominal bracing has been shown to be more effective than abdominal hollowing in optimizing spinal stability. When similar exercises are performed, core and limb muscle activation are reported to be higher under unstable conditions than under stable conditions. However, core muscle activation that is similar to or higher than that achieved in unstable conditions can also be achieved with ground-based free-weight exercises, such as Olympic lifts, squats, and dead lifts. Since the addition of unstable bases to resistance exercises can decrease force, power, velocity, and range of motion, they are not recommended as the primary training mode for athletic conditioning. However, the high muscle activation with the use of lower loads associated with instability resistance training suggests they can play an important role within a periodized training schedule, in rehabilitation programs, and for nonathletic individuals who prefer not to use ground-based free weights to achieve musculoskeletal health benefits.
Chaste: A test-driven approach to software development for biological modelling

NASA Astrophysics Data System (ADS)

Pitt-Francis, Joe; Pathmanathan, Pras; Bernabeu, Miguel O.; Bordas, Rafel; Cooper, Jonathan; Fletcher, Alexander G.; Mirams, Gary R.; Murray, Philip; Osborne, James M.; Walter, Alex; Chapman, S. Jon; Garny, Alan; van Leeuwen, Ingeborg M. M.; Maini, Philip K.; Rodríguez, Blanca; Waters, Sarah L.; Whiteley, Jonathan P.; Byrne, Helen M.; Gavaghan, David J.

2009-12-01

Chaste ('Cancer, heart and soft-tissue environment') is a software library and a set of test suites for computational simulations in the domain of biology. Current functionality has arisen from modelling in the fields of cancer, cardiac physiology and soft-tissue mechanics. It is released under the LGPL 2.1 licence. Chaste has been developed using agile programming methods. The project began in 2005 when it was reasoned that the modelling of a variety of physiological phenomena required both a generic mathematical modelling framework, and a generic computational/simulation framework. The Chaste project evolved from the Integrative Biology (IB) e-Science Project, an inter-institutional project aimed at developing a suitable IT infrastructure to support physiome-level computational modelling, with a primary focus on cardiac and cancer modelling. Program summaryProgram title: Chaste Catalogue identifier: AEFD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFD_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: LGPL 2.1 No. of lines in distributed program, including test data, etc.: 5 407 321 No. of bytes in distributed program, including test data, etc.: 42 004 554 Distribution format: tar.gz Programming language: C++ Operating system: Unix Has the code been vectorised or parallelized?: Yes. Parallelized using MPI. RAM:<90 Megabytes for two of the scenarios described in Section 6 of the manuscript (Monodomain re-entry on a slab or Cylindrical crypt simulation). Up to 16 Gigabytes (distributed across processors) for full resolution bidomain cardiac simulation. Classification: 3. External routines: Boost, CodeSynthesis XSD, CxxTest, HDF5, METIS, MPI, PETSc, Triangle, Xerces Nature of problem: Chaste may be used for solving coupled ODE and PDE systems arising from modelling biological systems. Use of Chaste in two application areas are described in this paper: cardiac electrophysiology and intestinal crypt dynamics. Solution method: Coupled multi-physics with PDE, ODE and discrete mechanics simulation. Running time: The largest cardiac simulation described in the manuscript takes about 6 hours to run on a single 3 GHz core. See results section (Section 6) of the manuscript for discussion on parallel scaling.
Advanced Multiple Processor Configuration Study. Final Report.

ERIC Educational Resources Information Center

Clymer, S. J.

This summary of a study on multiple processor configurations includes the objectives, background, approach, and results of research undertaken to provide the Air Force with a generalized model of computer processor combinations for use in the evaluation of proposed flight training simulator computational designs. An analysis of a real-time flight…
The use of imprecise processing to improve accuracy in weather & climate prediction

NASA Astrophysics Data System (ADS)

Düben, Peter D.; McNamara, Hugh; Palmer, T. N.

2014-08-01

The use of stochastic processing hardware and low precision arithmetic in atmospheric models is investigated. Stochastic processors allow hardware-induced faults in calculations, sacrificing bit-reproducibility and precision in exchange for improvements in performance and potentially accuracy of forecasts, due to a reduction in power consumption that could allow higher resolution. A similar trade-off is achieved using low precision arithmetic, with improvements in computation and communication speed and savings in storage and memory requirements. As high-performance computing becomes more massively parallel and power intensive, these two approaches may be important stepping stones in the pursuit of global cloud-resolving atmospheric modelling. The impact of both hardware induced faults and low precision arithmetic is tested using the Lorenz '96 model and the dynamical core of a global atmosphere model. In the Lorenz '96 model there is a natural scale separation; the spectral discretisation used in the dynamical core also allows large and small scale dynamics to be treated separately within the code. Such scale separation allows the impact of lower-accuracy arithmetic to be restricted to components close to the truncation scales and hence close to the necessarily inexact parametrised representations of unresolved processes. By contrast, the larger scales are calculated using high precision deterministic arithmetic. Hardware faults from stochastic processors are emulated using a bit-flip model with different fault rates. Our simulations show that both approaches to inexact calculations do not substantially affect the large scale behaviour, provided they are restricted to act only on smaller scales. By contrast, results from the Lorenz '96 simulations are superior when small scales are calculated on an emulated stochastic processor than when those small scales are parametrised. This suggests that inexact calculations at the small scale could reduce computation and power costs without adversely affecting the quality of the simulations. This would allow higher resolution models to be run at the same computational cost.
Prefabricated Vertical Drain (PVD) and Deep Cement Mixing (DCM)/Stiffened DCM (SDCM) techniques for soft ground improvement

NASA Astrophysics Data System (ADS)

Bergado, D. T.; Long, P. V.; Chaiyaput, S.; Balasubramaniam, A. S.

2018-04-01

Soft ground improvement techniques have become most practical and popular methods to increase soil strength, soil stiffness and reduce soil compressibility including the soft Bangkok clay. This paper focuses on comparative performances of prefabricated vertical drain (PVD) using surcharge, vacuum and heat preloading as well as the cement-admixed clay of Deep Cement Mixing (DCM) and Stiffened DCM (SDCM) methods. The Vacuum-PVD can increase the horizontal coefficient of consolidation, Ch, resulting in faster rate of settlement at the same magnitudes of settlement compared to Conventional PVD. Several field methods of applying vacuum preloading are also compared. Moreover, the Thermal PVD and Thermal Vacuum PVD can increase further the coefficient of horizontal consolidation, Ch, with the associated reduction of kh/ks values by reducing the drainage retardation effects in the smear zone around the PVD which resulted in faster rates of consolidation and higher magnitudes of settlements. Furthermore, the equivalent smear effect due to non-uniform consolidation is also discussed in addition to the smear due to the mechanical installation of PVDs. In addition, a new kind of reinforced deep mixing method, namely Stiffened Deep Cement Mixing (SDCM) pile is introduced to improve the flexural resistance, improve the field quality control, and prevent unexpected failures of the Deep Cement Mixing (DCM) pile. The SDCM pile consists of DCM pile reinforced with the insertion of precast reinforced concrete (RC) core. The full scale test embankment on soft clay improved by SDCM and DCM piles was also analysed. Numerical simulations using the 3D PLAXIS Foundation finite element software have been done to understand the behavior of SDCM and DCM piles. The simulation results indicated that the surface settlements decreased with increasing lengths of the RC cores, and, at lesser extent, increasing sectional areas of the RC cores in the SDCM piles. In addition, the lateral movements decreased by increasing the lengths (longer than 4 m) and, the sectional areas of the RC cores in the SDCM piles. The results of the numerical simulations closely agreed with the observed data and successfully verified the parameters affecting the performances and behavior of both SDCM and DCM piles.
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chang, Christopher H.; Long, Hai; Sides, Scott

2015-10-15

Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement ofmore » future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.« less
An Efficient Downlink Scheduling Strategy Using Normal Graphs for Multiuser MIMO Wireless Systems

NASA Astrophysics Data System (ADS)

Chen, Jung-Chieh; Wu, Cheng-Hsuan; Lee, Yao-Nan; Wen, Chao-Kai

Inspired by the success of the low-density parity-check (LDPC) codes in the field of error-control coding, in this paper we propose transforming the downlink multiuser multiple-input multiple-output scheduling problem into an LDPC-like problem using the normal graph. Based on the normal graph framework, soft information, which indicates the probability that each user will be scheduled to transmit packets at the access point through a specified angle-frequency sub-channel, is exchanged among the local processors to iteratively optimize the multiuser transmission schedule. Computer simulations show that the proposed algorithm can efficiently schedule simultaneous multiuser transmission which then increases the overall channel utilization and reduces the average packet delay.
Characterizing the effects of intermittent faults on a processor for dependability enhancement strategy.

PubMed

Wang, Chao Saul; Fu, Zhong-Chuan; Chen, Hong-Song; Wang, Dong-Sheng

2014-01-01

As semiconductor technology scales into the nanometer regime, intermittent faults have become an increasing threat. This paper focuses on the effects of intermittent faults on NET versus REG on one hand and the implications for dependability strategy on the other. First, the vulnerability characteristics of representative units in OpenSPARC T2 are revealed, and in particular, the highly sensitive modules are identified. Second, an arch-level dependability enhancement strategy is proposed, showing that events such as core/strand running status and core-memory interface events can be candidates of detectable symptoms. A simple watchdog can be deployed to detect application running status (IEXE event). Then SDC (silent data corruption) rate is evaluated demonstrating its potential. Third and last, the effects of traditional protection schemes in the target CMT to intermittent faults are quantitatively studied on behalf of the contribution of each trap type, demonstrating the necessity of taking this factor into account for the strategy.
Characterizing and Optimizing the Performance of the MAESTRO 49-Core Processor

DTIC Science & Technology

2014-03-27

process large volumes of data, it is necessary during testing to vary the dimensions of the inbound data matrix to determine what effect this has on the...needed that can process the extra data these systems seek to collect. However, the space environment presents a number of threats, such as ambient or...induced faults, and that also have sufficient computational power to handle the large flow of data they encounter. This research investigates one
European Scientific Notes. Volume 35, Number 7,

DTIC Science & Technology

1981-07-31

simulated the entire processor down cores, semiconductor PROMs, etc. pack- to gate level on a PDP-11/45 computer, aged on FUROCARDS can be interfaced...approaching retirement were used to generate internal heat age , but DERMO will undoubtedly con- when irradiated. It was found that tinue to be France’s leading...import- parameters , such a doublet will focus ance. it plays an important role not a bundle of rays incident parallel only in mapping and defining the
Common Board Design for the OBC I/O Unit and The OBC CCSDS Unit of The Stuttgart University Satellite "Flying Laptop"

NASA Astrophysics Data System (ADS)

Eickhoff, Jens; Cook, Barry; Walker, Paul; Habinc, Sadi; Witt, Rouven; Roser, Hans-Peter

2011-08-01

As already published in another paper at DASIA 2010 in Budapest [1] the University of Stuttgart, Germany, is developing an advanced 3-axis stabilized small satellite applying industry standards for command/control techniques, onboard software design and onboard computer components.The satellite has a launch mass of approx. 120kg and is foreseen to be launched end 2013 as piggy back payload on an Indian PSLV launcher.During phase C the main challenge was the conceptual design for an ultra compact and performant onboard computer (OBC), which is able to support an industry standard operating system, a PUS standard based onboard software (OBSW) and CCSDS standard based ground/space communication. The developed architecture is based on 4 main elements (see [1] and Figure 4):• the OBC core board (single board computer based on LEON3 FT architecture),• an I/O Board for all OBC digital interfaces to S/C equipment,• a CCSDS TC/TM pre-processor board,• CPDU being embedded in the PCDU.The EM for the OBC core meanwhile has been shipped to the University by the supplier Aeroflex Colorado Springs, USA and is in use in Stuttgart since January 2011. Figure 2 and Figure 3 provide brief impressions. This paper concentrates on the common design of the I/O board and the CCSDS processor boards.
Kalman Filter Tracking on Parallel Architectures

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2016-11-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.
Processor core for real time background identification of HD video based on OpenCV Gaussian mixture model algorithm

NASA Astrophysics Data System (ADS)

Genovese, Mariangela; Napoli, Ettore

2013-05-01

The identification of moving objects is a fundamental step in computer vision processing chains. The development of low cost and lightweight smart cameras steadily increases the request of efficient and high performance circuits able to process high definition video in real time. The paper proposes two processor cores aimed to perform the real time background identification on High Definition (HD, 1920 1080 pixel) video streams. The implemented algorithm is the OpenCV version of the Gaussian Mixture Model (GMM), an high performance probabilistic algorithm for the segmentation of the background that is however computationally intensive and impossible to implement on general purpose CPU with the constraint of real time processing. In the proposed paper, the equations of the OpenCV GMM algorithm are optimized in such a way that a lightweight and low power implementation of the algorithm is obtained. The reported performances are also the result of the use of state of the art truncated binary multipliers and ROM compression techniques for the implementation of the non-linear functions. The first circuit has commercial FPGA devices as a target and provides speed and logic resource occupation that overcome previously proposed implementations. The second circuit is oriented to an ASIC (UMC-90nm) standard cell implementation. Both implementations are able to process more than 60 frames per second in 1080p format, a frame rate compatible with HD television.
Soft tubular microfluidics for 2D and 3D applications

PubMed Central

Xi, Wang; Kong, Fang; Yeo, Joo Chuan; Yu, Longteng; Sonam, Surabhi; Dao, Ming; Gong, Xiaobo; Lim, Chwee Teck

2017-01-01

Microfluidics has been the key component for many applications, including biomedical devices, chemical processors, microactuators, and even wearable devices. This technology relies on soft lithography fabrication which requires cleanroom facilities. Although popular, this method is expensive and labor-intensive. Furthermore, current conventional microfluidic chips precludes reconfiguration, making reiterations in design very time-consuming and costly. To address these intrinsic drawbacks of microfabrication, we present an alternative solution for the rapid prototyping of microfluidic elements such as microtubes, valves, and pumps. In addition, we demonstrate how microtubes with channels of various lengths and cross-sections can be attached modularly into 2D and 3D microfluidic systems for functional applications. We introduce a facile method of fabricating elastomeric microtubes as the basic building blocks for microfluidic devices. These microtubes are transparent, biocompatible, highly deformable, and customizable to various sizes and cross-sectional geometries. By configuring the microtubes into deterministic geometry, we enable rapid, low-cost formation of microfluidic assemblies without compromising their precision and functionality. We demonstrate configurable 2D and 3D microfluidic systems for applications in different domains. These include microparticle sorting, microdroplet generation, biocatalytic micromotor, triboelectric sensor, and even wearable sensing. Our approach, termed soft tubular microfluidics, provides a simple, cheaper, and faster solution for users lacking proficiency and access to cleanroom facilities to design and rapidly construct microfluidic devices for their various applications and needs. PMID:28923968
Soft tubular microfluidics for 2D and 3D applications

NASA Astrophysics Data System (ADS)

Xi, Wang; Kong, Fang; Yeo, Joo Chuan; Yu, Longteng; Sonam, Surabhi; Dao, Ming; Gong, Xiaobo; Teck Lim, Chwee

2017-10-01

Microfluidics has been the key component for many applications, including biomedical devices, chemical processors, microactuators, and even wearable devices. This technology relies on soft lithography fabrication which requires cleanroom facilities. Although popular, this method is expensive and labor-intensive. Furthermore, current conventional microfluidic chips precludes reconfiguration, making reiterations in design very time-consuming and costly. To address these intrinsic drawbacks of microfabrication, we present an alternative solution for the rapid prototyping of microfluidic elements such as microtubes, valves, and pumps. In addition, we demonstrate how microtubes with channels of various lengths and cross-sections can be attached modularly into 2D and 3D microfluidic systems for functional applications. We introduce a facile method of fabricating elastomeric microtubes as the basic building blocks for microfluidic devices. These microtubes are transparent, biocompatible, highly deformable, and customizable to various sizes and cross-sectional geometries. By configuring the microtubes into deterministic geometry, we enable rapid, low-cost formation of microfluidic assemblies without compromising their precision and functionality. We demonstrate configurable 2D and 3D microfluidic systems for applications in different domains. These include microparticle sorting, microdroplet generation, biocatalytic micromotor, triboelectric sensor, and even wearable sensing. Our approach, termed soft tubular microfluidics, provides a simple, cheaper, and faster solution for users lacking proficiency and access to cleanroom facilities to design and rapidly construct microfluidic devices for their various applications and needs.
GERICOS: A Generic Framework for the Development of On-Board Software

NASA Astrophysics Data System (ADS)

Plasson, P.; Cuomo, C.; Gabriel, G.; Gauthier, N.; Gueguen, L.; Malac-Allain, L.

2016-08-01

This paper presents an overview of the GERICOS framework (GEneRIC Onboard Software), its architecture, its various layers and its future evolutions. The GERICOS framework, developed and qualified by LESIA, offers a set of generic, reusable and customizable software components for the rapid development of payload flight software. The GERICOS framework has a layered structure. The first layer (GERICOS::CORE) implements the concept of active objects and forms an abstraction layer over the top of real-time kernels. The second layer (GERICOS::BLOCKS) offers a set of reusable software components for building flight software based on generic solutions to recurrent functionalities. The third layer (GERICOS::DRIVERS) implements software drivers for several COTS IP cores of the LEON processor ecosystem.
Portable LQCD Monte Carlo code using OpenACC

NASA Astrophysics Data System (ADS)

Bonati, Claudio; Calore, Enrico; Coscetti, Simone; D'Elia, Massimo; Mesiti, Michele; Negro, Francesco; Fabio Schifano, Sebastiano; Silvi, Giorgio; Tripiccione, Raffaele

2018-03-01

Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.
Final Report for Project DE-FC02-06ER25755 [Pmodels2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panda, Dhabaleswar; Sadayappan, P.

2014-03-12

In this report, we describe the research accomplished by the OSU team under the Pmodels2 project. The team has worked on various angles: designing high performance MPI implementations on modern networking technologies (Mellanox InfiniBand (including the new ConnectX2 architecture and Quad Data Rate), QLogic InfiniPath, the emerging 10GigE/iWARP and RDMA over Converged Enhanced Ethernet (RoCE) and Obsidian IB-WAN), studying MPI scalability issues for multi-thousand node clusters using XRC transport, scalable job start-up, dynamic process management support, efficient one-sided communication, protocol offloading and designing scalable collective communication libraries for emerging multi-core architectures. New designs conforming to the Argonne’s Nemesis interface havemore » also been carried out. All of these above solutions have been integrated into the open-source MVAPICH/MVAPICH2 software. This software is currently being used by more than 2,100 organizations worldwide (in 71 countries). As of January ’14, more than 200,000 downloads have taken place from the OSU Web site. In addition, many InfiniBand vendors, server vendors, system integrators and Linux distributors have been incorporating MVAPICH/MVAPICH2 into their software stacks and distributing it. Several InfiniBand systems using MVAPICH/MVAPICH2 have obtained positions in the TOP500 ranking of supercomputers in the world. The latest November ’13 ranking include the following systems: 7th ranked Stampede system at TACC with 462,462 cores; 11th ranked Tsubame 2.5 system at Tokyo Institute of Technology with 74,358 cores; 16th ranked Pleiades system at NASA with 81,920 cores; Work on PGAS models has proceeded on multiple directions. The Scioto framework, which supports task-parallelism in one-sided and global-view parallel programming, has been extended to allow multi-processor tasks that are executed by processor groups. A quantum Monte Carlo application is being ported onto the extended Scioto framework. A public release of Global Trees (GT) has been made, along with the Global Chunks (GC) framework on which GT is built. The Global Chunks (GC) layer is also being used as the basis for the development of a higher level Global Graphs (GG) layer. The Global Graphs (GG) system will provide a global address space view of distributed graph data structures on distributed memory systems.« less

Gaseous electron multiplier-based soft x-ray plasma diagnostics development: Preliminary tests at ASDEX Upgrade

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chernyshova, M., E-mail: maryna.chernyshova@ipplm.pl; Malinowski, K.; Czarski, T.

2016-11-15

A Gaseous Electron Multiplier (GEM)-based detector is being developed for soft X-ray diagnostics on tokamaks. Its main goal is to facilitate transport studies of impurities like tungsten. Such studies are very relevant to ITER, where the excessive accumulation of impurities in the plasma core should be avoided. This contribution provides details of the preliminary tests at ASDEX Upgrade (AUG) with a focus on the most important aspects for detector operation in harsh radiation environment. It was shown that both spatially and spectrally resolved data could be collected, in a reasonable agreement with other AUG diagnostics. Contributions to the GEM signalmore » include also hard X-rays, gammas, and neutrons. First simulations of the effect of high-energy photons have helped understanding these contributions.« less
Fast soft x-ray images of magnetohydrodynamic phenomena in NSTX.

PubMed

Bush, C E; Stratton, B C; Robinson, J; Zakharov, L E; Fredrickson, E D; Stutman, D; Tritz, K

2008-10-01

A variety of magnetohydrodynamic (MHD) phenomena have been observed on NSTX. Many of these affect fast particle losses, which are of major concern for future burning plasma experiments. Usual diagnostics for studying these phenomena are arrays of Mirnov coils for magnetic oscillations and p-i-n diode arrays for soft x-ray emission from the plasma core. Data reported here are from a unique fast soft x-ray imaging camera (FSXIC) with a wide-angle (pinhole) tangential view of the entire plasma minor cross section. The camera provides a 64x64 pixel image, on a charge coupled device chip, of light resulting from conversion of soft x rays incident on a phosphor to the visible. We have acquired plasma images at frame rates of 1-500 kHz (300 frames/shot) and have observed a variety of MHD phenomena: disruptions, sawteeth, fishbones, tearing modes, and edge localized modes (ELMs). New data including modes with frequency >90 kHz are also presented. Data analysis and modeling techniques used to interpret the FSXIC data are described and compared, and FSXIC results are compared to Mirnov and p-i-n diode array results.
Many-integrated core (MIC) technology for accelerating Monte Carlo simulation of radiation transport: A study based on the code DPM

NASA Astrophysics Data System (ADS)

Rodriguez, M.; Brualla, L.

2018-04-01

Monte Carlo simulation of radiation transport is computationally demanding to obtain reasonably low statistical uncertainties of the estimated quantities. Therefore, it can benefit in a large extent from high-performance computing. This work is aimed at assessing the performance of the first generation of the many-integrated core architecture (MIC) Xeon Phi coprocessor with respect to that of a CPU consisting of a double 12-core Xeon processor in Monte Carlo simulation of coupled electron-photonshowers. The comparison was made twofold, first, through a suite of basic tests including parallel versions of the random number generators Mersenne Twister and a modified implementation of RANECU. These tests were addressed to establish a baseline comparison between both devices. Secondly, through the p DPM code developed in this work. p DPM is a parallel version of the Dose Planning Method (DPM) program for fast Monte Carlo simulation of radiation transport in voxelized geometries. A variety of techniques addressed to obtain a large scalability on the Xeon Phi were implemented in p DPM. Maximum scalabilities of 84 . 2 × and 107 . 5 × were obtained in the Xeon Phi for simulations of electron and photon beams, respectively. Nevertheless, in none of the tests involving radiation transport the Xeon Phi performed better than the CPU. The disadvantage of the Xeon Phi with respect to the CPU owes to the low performance of the single core of the former. A single core of the Xeon Phi was more than 10 times less efficient than a single core of the CPU for all radiation transport simulations.
Core tungsten radiation diagnostic calibration by small shell pellet injection in the DIII-D tokamak

DOE PAGES

Hollmann, Eric M.; Commaux, Nicolas; Shiraki, Daisuke; ...

2017-10-04

Injection of small (OD = 0.8 mm) plastic pellets carrying embedded smaller (10 μg) tungsten grains is used to check calibrations of core tungsten line radiation diagnostics in support of the 2016 tungsten rings campaign in the DIII-D tokamak. The total (1 eV – 10 keV) and soft x-ray (1 keV – 10 keV) brightnesses we observed were found to be reasonably well (< factor 2) predicted using existing calibration factors and rate calculations. Individual core (EUV/SXR) tungsten line brightnesses appear to be somewhat less reliable (factor 2-4) for prediction of core tungsten concentration.
Core tungsten radiation diagnostic calibration by small shell pellet injection in the DIII-D tokamak

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hollmann, Eric M.; Commaux, Nicolas; Shiraki, Daisuke

Injection of small (OD = 0.8 mm) plastic pellets carrying embedded smaller (10 μg) tungsten grains is used to check calibrations of core tungsten line radiation diagnostics in support of the 2016 tungsten rings campaign in the DIII-D tokamak. The total (1 eV – 10 keV) and soft x-ray (1 keV – 10 keV) brightnesses we observed were found to be reasonably well (< factor 2) predicted using existing calibration factors and rate calculations. Individual core (EUV/SXR) tungsten line brightnesses appear to be somewhat less reliable (factor 2-4) for prediction of core tungsten concentration.
Microstructure, soft magnetic properties and applications of amorphous Fe-Co-Si-B-Mo-P alloy

NASA Astrophysics Data System (ADS)

Hasiak, Mariusz; Miglierini, Marcel; Łukiewski, Mirosław; Łaszcz, Amadeusz; Bujdoš, Marek

2018-05-01

DC thermomagnetic properties of Fe51Co12Si16B8Mo5P8 amorphous alloy in the as-quenched and after annealing below crystallization temperature are investigated. They are related to deviations in the microstructure as revealed by Mössbauer spectrometry. Study of AC magnetic properties, i.e. hysteresis loops, relative permeability and core losses versus maximum induction was aimed at obtaining optimal initial parameters for simulation process of a resonant transformer for a rail power supply converter. The results obtained from numerical analyses including core losses, winding losses, core mass, and dimensions were compared with the same parameters calculated for Fe-Si alloy and ferrite. Moreover, Steinmetz coefficients were also calculated for the as-quenched Fe51Co12Si16B8Mo5P8 amorphous alloy.
Implementing direct, spatially isolated problems on transputer networks

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1988-01-01

Parametric studies were performed on transputer networks of up to 40 processors to determine how to implement and maximize the performance of the solution of problems where no processor-to-processor data transfer is required for the problem solution (spatially isolated). Two types of problems are investigated a computationally intensive problem where the solution required the transmission of 160 bytes of data through the parallel network, and a communication intensive example that required the transmission of 3 Mbytes of data through the network. This data consists of solutions being sent back to the host processor and not intermediate results for another processor to work on. Studies were performed on both integer and floating-point transputers. The latter features an on-chip floating-point math unit and offers approximately an order of magnitude performance increase over the integer transputer on real valued computations. The results indicate that a minimum amount of work is required on each node per communication to achieve high network speedups (efficiencies). The floating-point processor requires approximately an order of magnitude more work per communication than the integer processor because of the floating-point unit's increased computing capacity.
Sintered magnetic cores of high Bs Fe84.3Si4B8P3Cu0.7 nano-crystalline alloy with a lamellar microstructure

NASA Astrophysics Data System (ADS)

Zhang, Yan; Sharma, Parmanand; Makino, Akihiro

2014-05-01

Fabrication of bulk cores of nano-crystalline Fe84.3Si4B8P3Cu0.7 alloy with a lamellar type of microstructure is reported. Amorphous ribbon flakes of size ˜1.0-2.0 mm were compacted in the bulk form by spark plasma sintering technique at different sintering temperatures. High density (˜96.4%) cores with a uniform nano-granular structure made from α-Fe (˜31 nm) were obtained. These cores show excellent mechanical and soft magnetic properties. The lamellar micro-structure is shown to be important in achieving significantly lower magnetic core loss than the non-oriented silicon steel sheets, commercial powder cores and even the core made of the same alloy with finer and randomly oriented powder particles.
Use of soft x-ray diagnostic on the COMPASS tokamak for investigations of sawteeth crash neighborhood and of plasma position using fast inversion methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Imrisek, M.; Faculty of Mathematics and Physics, Charles University in Prague, Prague; Weinzettl, V.

2014-11-15

The soft x-ray diagnostic is suitable for monitoring plasma activity in the tokamak core, e.g., sawtooth instability. Moreover, spatially resolved measurements can provide information about plasma position and shape, which can supplement magnetic measurements. In this contribution, fast algorithms with the potential for a real-time use are tested on the data from the COMPASS tokamak. In addition, the soft x-ray data are compared with data from other diagnostics in order to discuss possible connection between sawtooth instability on one side and the transition to higher confinement mode, edge localized modes and productions of runaway electrons on the other side.
Magnetic measurement of soft magnetic composites material under 3D SVPWM excitation

NASA Astrophysics Data System (ADS)

Zhang, Changgeng; Jiang, Baolin; Li, Yongjian; Yang, Qingxin

2018-05-01

The magnetic properties measurement and analysis of soft magnetic material under the rotational space-vector pulse width modulation (SVPWM) excitation are key factors in design and optimization of the adjustable speed motor. In this paper, a three-dimensional (3D) magnetic properties testing system fit for SVPWM excitation is built, which includes symmetrical orthogonal excitation magnetic circuit and cubic field-metric sensor. Base on the testing system, the vector B and H loci of soft magnetic composite (SMC) material under SVPWM excitation are measured and analyzed by proposed 3D SVPWM control method. Alternating and rotating core losses under various complex excitation with different magnitude modulation ratio are calculated and compared.
LArSoft: toolkit for simulation, reconstruction and analysis of liquid argon TPC neutrino detectors

NASA Astrophysics Data System (ADS)

Snider, E. L.; Petrillo, G.

2017-10-01

LArSoft is a set of detector-independent software tools for the simulation, reconstruction and analysis of data from liquid argon (LAr) neutrino experiments The common features of LAr time projection chambers (TPCs) enable sharing of algorithm code across detectors of very different size and configuration. LArSoft is currently used in production simulation and reconstruction by the ArgoNeuT, DUNE, LArlAT, MicroBooNE, and SBND experiments. The software suite offers a wide selection of algorithms and utilities, including those for associated photo-detectors and the handling of auxiliary detectors outside the TPCs. Available algorithms cover the full range of simulation and reconstruction, from raw waveforms to high-level reconstructed objects, event topologies and classification. The common code within LArSoft is contributed by adopting experiments, which also provide detector-specific geometry descriptions, and code for the treatment of electronic signals. LArSoft is also a collaboration of experiments, Fermilab and associated software projects which cooperate in setting requirements, priorities, and schedules. In this talk, we outline the general architecture of the software and the interaction with external libraries and detector-specific code. We also describe the dynamics of LArSoft software development between the contributing experiments, the projects supporting the software infrastructure LArSoft relies on, and the core LArSoft support project.
Stiffness at shear-wave elastography and patient presentation predicts upgrade at surgery following an ultrasound-guided core biopsy diagnosis of ductal carcinoma in situ.

PubMed

Evans, A; Purdie, C A; Jordan, L; Macaskill, E J; Flynn, J; Vinnicombe, S

2016-11-01

The aim of this study is to establish predictors of invasion in lesions yielding an ultrasound-guided biopsy diagnosis of ductal carcinoma in situ (DCIS). Patients subjected to ultrasound-guided core biopsy yielding DCIS were studied. At shear-wave elastography (SWE) a threshold of 50 kPa was used for mean elasticity (Emean) to dichotomise the elasticity data between invasive and non-invasive masses. Data recorded included the mammographic and ultrasound features, the referral source, and grade of DCIS in the biopsy. The chi-square test was used to detect statistical significance. Of 57 lesions, 24 (42%) had invasion at excision. Symptomatic patients and patients with stiff lesions were more likely to have invasion than patients presenting through screening and with soft lesions (58% [14 of 24] versus 30% [10 of 33], p=0.03) and (51% [20 of 39] versus 22% [4 of 18], p=0.04). No other factors showed a relationship with invasion. Combining the two predictors of invasion improved risk stratification with symptomatic and stiff lesions having a risk of invasion of 67% (12 of 18) and soft lesions presenting at screening having only a 17% (2 of 12) risk of invasion (p=0.02). Stiffness on SWE and the referral source of the patient are predictors of occult invasion in women with an ultrasound-guided core biopsy diagnosis of DCIS. Copyright © 2016 The Royal College of Radiologists. Published by Elsevier Ltd. All rights reserved.
Comparison of the fractional power motor with cores made of various magnetic materials

NASA Astrophysics Data System (ADS)

Gmyrek, Zbigniew; Lefik, Marcin; Cavagnino, Andrea; Ferraris, Luca

2017-12-01

The optimization of the motor cores, coupled with new core shapes as well as powering the motor at high frequency are the primary reasons for the use of new materials. The utilization of new materials, like SMC (soft magnetic composite), reduce the core loss and/or provide quasi-isotropic core's properties in any magnetization direction. Moreover, the use of SMC materials allows for avoiding degradation of the material portions, resulting from punching process, thereby preventing the deterioration of operating parameters of the motor. The authors examine the impact of technological parameters on the properties of a new type of SMC material and analyze the possibility of its use as the core of the fractional power motor. The result of the work is an indication of the shape of the rotor core made of a new SMC material to achieve operational parameters similar to those that have a motor with a core made of laminations.
Articular soft tissue anatomy of the archosaur hip joint: Structural homology and functional implications.

PubMed

Tsai, Henry P; Holliday, Casey M

2015-06-01

Archosaurs evolved a wide diversity of locomotor postures, body sizes, and hip joint morphologies. The two extant archosaurs clades (birds and crocodylians) possess highly divergent hip joint morphologies, and the homologies and functions of their articular soft tissues, such as ligaments, cartilage, and tendons, are poorly understood. Reconstructing joint anatomy and function of extinct vertebrates is critical to understanding their posture, locomotor behavior, ecology, and evolution. However, the lack of soft tissues in fossil taxa makes accurate inferences of joint function difficult. Here, we describe the soft tissue anatomies and their osteological correlates in the hip joint of archosaurs and their sauropsid outgroups, and infer structural homology across the extant taxa. A comparative sample of 35 species of birds, crocodylians, lepidosaurs, and turtles ranging from hatchling to skeletally mature adult were studied using dissection, imaging, and histology. Birds and crocodylians possess topologically and histologically consistent articular soft tissues in their hip joints. Epiphyseal cartilages, fibrocartilages, and ligaments leave consistent osteological correlates. The archosaur acetabulum possesses distinct labrum and antitrochanter structures on the supraacetabulum. The ligamentum capitis femoris consists of distinct pubic- and ischial attachments, and is homologous with the ventral capsular ligament of lepidosaurs. The proximal femur has a hyaline cartilage core attached to the metaphysis via a fibrocartilaginous sleeve. This study provides new insight into soft tissue structures and their osteological correlates (e.g., the antitrochanter, the fovea capitis, and the metaphyseal collar) in the archosaur hip joint. The topological arrangement of fibro- and hyaline cartilage may provide mechanical support for the chondroepiphysis. The osteological correlates identified here will inform systematic and functional analyses of archosaur hindlimb evolution and provide the anatomical foundation for biomechanical investigations of joint tissues. © 2014 Wiley Periodicals, Inc.
Effect of processor temperature on film dosimetry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Srivastava, Shiv P.; Das, Indra J., E-mail: idas@iupui.edu

2012-07-01

Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d{sub max.}, 10 Multiplication-Sign 10 cm{sup 2}, 100 cm) to a given dose. Anmore » automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4-40.6 Degree-Sign C (85-105 Degree-Sign F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.« less
Enhanced tactical radar correlator (ETRAC): true interoperability of the 1990s

NASA Astrophysics Data System (ADS)

Guillen, Frank J.

1994-10-01

The enhanced tactical radar correlator (ETRAC) system is under development at Westinghouse Electric Corporation for the Army Space Program Office (ASPO). ETRAC is a real-time synthetic aperture radar (SAR) processing system that provides tactical IMINT to the corps commander. It features an open architecture comprised of ruggedized commercial-off-the-shelf (COTS), UNIX based workstations and processors. The architecture features the DoD common SAR processor (CSP), a multisensor computing platform to accommodate a variety of current and future imaging needs. ETRAC's principal functions include: (1) Mission planning and control -- ETRAC provides mission planning and control for the U-2R and ASARS-2 sensor, including capability for auto replanning, retasking, and immediate spot. (2) Image formation -- the image formation processor (IFP) provides the CPU intensive processing capability to produce real-time imagery for all ASARS imaging modes of operation. (3) Image exploitation -- two exploitation workstations are provided for first-phase image exploitation, manipulation, and annotation. Products include INTEL reports, annotated NITF SID imagery, high resolution hard copy prints and targeting data. ETRAC is transportable via two C-130 aircraft, with autonomous drive on/off capability for high mobility. Other autonomous capabilities include rapid setup/tear down, extended stand-alone support, internal environmental control units (ECUs) and power generation. ETRAC's mission is to provide the Army field commander with accurate, reliable, and timely imagery intelligence derived from collections made by the ASARS-2 sensor, located on-board the U-2R aircraft. To accomplish this mission, ETRAC receives video phase history (VPH) directly from the U-2R aircraft and converts it in real time into soft copy imagery for immediate exploitation and dissemination to the tactical users.
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu

2012-10-01

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
General purpose molecular dynamics simulations fully implemented on graphics processing units

NASA Astrophysics Data System (ADS)

Anderson, Joshua A.; Lorenz, Chris D.; Travesset, A.

2008-05-01

Graphics processing units (GPUs), originally developed for rendering real-time effects in computer games, now provide unprecedented computational power for scientific applications. In this paper, we develop a general purpose molecular dynamics code that runs entirely on a single GPU. It is shown that our GPU implementation provides a performance equivalent to that of fast 30 processor core distributed memory cluster. Our results show that GPUs already provide an inexpensive alternative to such clusters and discuss implications for the future.
Investigating the Naval Logistics Role in Humanitarian Assistance Activities

DTIC Science & Technology

2015-03-01

transportation means. E. BASE CASE RESULTS The computations were executed on a MacBook Pro , 3 GHz Intel Core i7-4578U processor with 8 GB. The...MacBook Pro was partitioned to also contain a Windows 7, 64-bit operating system. The computations were run in the Windows 7 operating system using the...it impacts the types of metamodels that can be developed as a result of data farming (Lucas et al., 2015). Using a metamodel, one can closely
SC'11 Poster: A Highly Efficient MGPT Implementation for LAMMPS; with Strong Scaling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oppelstrup, T; Stukowski, A; Marian, J

2011-12-07

The MGPT potential has been implemented as a drop in package to the general molecular dynamics code LAMMPS. We implement an improved communication scheme that shrinks the communication layer thickness, and increases the load balancing. This results in unprecedented strong scaling, and speedup continuing beyond 1/8 atom/core. In addition, we have optimized the small matrix linear algebra with generic blocking (for all processors) and specific SIMD intrinsics for vectorization on Intel, AMD, and BlueGene CPUs.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.